The skewness of a probability distribution

In this post, we discuss how to calculate the moment coefficient of skewness and also discuss some issues surrounding the notion of skewness.

________________________________________________________________________

Looking at graphs

One informal but useful way of checking the skewness of a distribution is to look at the density curve (or a histogram). Consider the following density functions.

Figure 1

Figure 2

The density curve in Figure 1 has a longer tail to the right than to the left. The example in Figure 1 is a distribution that is skewed to the right. It is also said to be positively skewed since its coefficient of skewness is positive. The density curve in Figure 2 has a longer tail to the left than to the right. The example in Figure 2 is a distribution that is skewed to the left. It is also said to be negatively skewed since the skewness coefficient is negative. If a density curve looks the same to the left and to the right (such as the bell curve for the normal distribution), then it is a symmetric distribution and the skewness coefficient is zero.

The distribution in Figure 1 is a right skewed distribution (the longer tail is on the right). It is a gamma distribution with mean 2 and median approximately 1.678347. The mode (the highest peak) is at x = 1. The distribution in Figure 2 is a left skewed distribution (the longer tail is on the left) with mean and median approximately 0.909 and 0.9213562, respectively. The mode is at 0.95.

________________________________________________________________________

In the distribution for Figure 1, we can say that “mode < median < mean". In the distribution for Figure 2, we can say that "mean < median < mode". A common conception is that these simple rules characterize all skewed distribution, i.e., the mean is to the right of the median, which in turn is to the right of the mode in a right skewed distribution and that the mean is to the left of the median, which in turn is to the left of the mode in a left skewed distribution. Such rules are certainly easy to remember and are stated in some statistics textbooks. In the above two figures, this rule of thumb is certainly true. It turns out that this rule of thumb does not hold in many instances. The above two graphs are "textbook" demonstrations of skewness. They are gamma distributions and beta distributions and they behave well according to the usual notion of how skewed distributions should look like. In a later section of this post, we will discuss this issue in greater details. First we define the coefficient of skewness.

________________________________________________________________________

Pearson moment coefficient of skewness

The measure of skewness defined here is called the Pearson moment coefficient of skewness. This measure provides information about the amount and direction of the departure from symmetry. Its value can be positive or negative, or even undefined. The higher the absolute value of the skewness measure, the more asymmetric the distribution. The skewness measure of symmetric distributions is, or near, zero.

To help put the definition of skewness in context, we first define raw moments and central moments of a random variable $X$. The $k$th raw moment of $X$ is $E(X^k)$, the expected value of the $k$th power of the random variable $X$. The first raw moment is the mean of the random variable and is usually denoted by $\mu$.

The $k$th central moment of a random variable $X$ is $E[(X-\mu)^k]$, the expected value of the $k$th power of the deviation of the variable from its mean. The moment $E[(X-\mu)^k]$ is usually denoted by $\mu_k$. The second central moment is usually called the variance and is denoted by $\sigma^2$. The square root of $\sigma^2$, $\sigma$, is the standard deviation.

The ratio of the standard deviation to the mean, $\displaystyle \frac{\sigma}{\mu}$, is called the coefficient of variation.

The ratio of the third central moment to the cube of the standard deviation is called Pearson’s moment coefficient of skewness (or the coefficient of skewness) and is denoted by $\gamma_1$.

$\displaystyle \gamma_1=\frac{E[ (X-\mu)^3 ]}{\sigma^3}=\frac{\mu_3}{\sigma^3} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

The skewness in (1) can be expanded to derive a version that can be calculated more easily:

\displaystyle \begin{aligned} \gamma_1&=\frac{E[ (X-\mu)^3 ]}{\sigma^3} \\&=\frac{E(X^3)-3 \mu E(X^2)+3 \mu^2 E(X)-\mu^3}{\sigma^3} \\&=\frac{E(X^3)-3 \mu [E(X^2)-\mu E(X)]-\mu^3}{\sigma^3} \\&=\frac{E(X^3)-3 \mu \sigma^2-\mu^3}{\sigma^3} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2) \\&=\frac{E(X^3)-3 \mu \sigma^2-\mu^3}{(\sigma^2)^{\frac{3}{2}}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3) \end{aligned}

The last version (3) is in terms of the first raw moment $\mu$, the second central moment $\sigma^2$ and the third raw moment $E(X^3)$. Essentially, the coefficient $\gamma_1$ can be obtained via (3) by first computing the first three raw moments.

Even though kurtosis is not the focus of this post, we would like to state it to complete the the brief discussion on moments. The ratio of the fourth central moment to the fourth power of the standard deviation, $\displaystyle \gamma_2=\frac{\mu_4}{\sigma^4}$, is called the kurtosis.

________________________________________________________________________

Examples

In this section, we discuss the skewness in two familiar families of continuous distributions – gamma and beta. We also demonstrate how exponentiation can affect skewness.

Example 1Gamma Distribution
The following is the probability density function of the gamma distribution.

$\displaystyle f(x)=\frac{\beta^\alpha}{\Gamma(\alpha)} \ x^{\alpha-1} \ e^{-\beta x} \ \ \ \ \ \ \ \ \ x>0$

where $\Gamma(\cdot)$ is the gamma function, and $\alpha$ and $\beta$ are parameters such that $\alpha>0$ and $\beta>0$. The number $\alpha$ is the shape parameter and the number $\beta$ here is the rate parameter. Figure 1 shows the gamma distribution with $\alpha=2$ and $\beta=1$. When $\alpha=1$, we obtain the exponential distribution. When $\beta=\frac{1}{2}$ and $\alpha=\frac{k}{2}$ where $k$ is a positive integer, we obtain the chi square distribution with $k$ degrees of freedom.

Let $X$ be a random variable with the above gamma density function. The raw moments $E(X^k)$, where $k=1,2,3,\cdots$, are:

$\displaystyle E(X^k)=\frac{(\alpha+k-1)(\alpha+k-2) \cdots \alpha}{\beta^k}$

Using the first two raw moments to calculate the variance as well as the third moment, the following calculates the moment coefficient of skewness, based on the form in (3):

$\displaystyle \gamma_1=\frac{\displaystyle \frac{(\alpha+2)(\alpha+1)\alpha}{\beta^3}-3 \frac{\alpha}{\beta} \frac{\alpha}{\beta^3}-\frac{\alpha^3}{\beta^3}}{\biggl( \displaystyle \frac{\alpha}{\beta^2} \biggr)^{\frac{3}{2}}}=\frac{2}{\sqrt{\alpha}}$

The above calculation shows that the rate parameter $\beta$ has no effect on skewness. The example in Figure 1 has $\alpha=2$, giving a coefficient of skewness of $\sqrt{2}$ = 1.414213562. In general, the gamma distribution is skewed positively. However, the gamma distribution becomes more and more symmetric as the shape parameter $\alpha \rightarrow \infty$. The following graph the gamma densities for $\alpha=1, 2, 3, 5, 6$ and $\beta=1$.

Figure 3

In Figure 3, the light blue density with $\alpha=1$ is an exponential distribution. The red one with $\alpha=2$ is the density in Figure 1. With $\alpha=6$, the gamma density already looks very symmetric (the dark blue).

On the other hand, as the shape parameter $\alpha \rightarrow 0$, the gamma distribution becomes increasingly positively skewed. When $\alpha=\frac{1}{n}$, $\gamma_1=2 \sqrt{n}$. When $n \rightarrow \infty$, $\gamma_1 \rightarrow \infty$.

Example 2Beta Distribution
The following is the PDF of a beta distribution:

$\displaystyle f(x)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} \ x^{\alpha-1} \ (1-x)^{\beta-1} \ \ \ \ \ \ \ \ \ \ \ \ 0

where $\Gamma(\cdot)$ is the gamma function, and $\alpha$ and $\beta$ are parameters such that $\alpha>0$ and $\beta>0$. Both $\alpha$ and $\beta$ are shape parameters.

In the beta family of distributions, the skewness can range from positive to negative. If the $\alpha$ parameter dominates (i.e. $x$ is to a higher power and $1-x$ is to a small power in the density function), then the beta distribution has a negative skew (skewed to the left). This is because the function $x^n$ has left skew and the function $(1-x)^n$ has right skew. Then the skewness of the beta distribution follows the one that dominates. If the $\beta$ parameter dominates, the beta distribution is skewed to the right. If both parameters are roughly equal, the beta distribution is close to symmetric. For example when $\alpha=20$ and $\beta=2$, the beta distribution is left skewed (its density curve is in Figure 2). As in the gamma case, the skewness of the beta distribution has a close form. The following formula confirms the intuition about the skewness of the beta distribution (found here).

$\displaystyle \gamma_1=\frac{2(\beta-\alpha) \ \sqrt{\alpha+\beta+1}}{(\alpha+\beta+2) \ \sqrt{\alpha \ \beta}}$

Thus the beta distribution with $\alpha=20$ and $\beta=2$ has skewness coefficient -1.137431317. The following figure further demonstrates the role of the shape parameters play in changing the skewness of the beta distribution.

Figure 4

In Figure 4, as the $\alpha$ parameter goes from 2 to 20, the skewness goes from 1.137431317 to 0.659393193 to 0 to -0.659393193 to -1.137431317.

Example 3Exponentiation
Symmetric distributions have zero coefficient of skewness. Raising a symmetric distribution to a positive power can produce a skewed distribution. For example, let $X$ be the standard normal random variable (mean 0 and variance 1). Let $Y=X^2$. Then $Y$ has a chi-square distribution with 1 degree of freedom, which means that it is a gamma distribution with $\alpha=\frac{1}{2}$ and $\beta=\frac{1}{2}$. According to Example 1 above, the skewness coefficient is $\frac{2}{\sqrt{0.5}}=2 \sqrt{2}=2.828$. Thus squaring a standard normal distribution produces a very strongly positively skewed distribution.

Example 4Exponentiation
When raising a positively skewed distribution to positive power can produce a more strongly positively skewed distribution. For example, let $X$ be an exponential random variable. Example 1 shows that exponential distributions have skewness coefficient 2. We show that the coefficient of skewness for $Y=X^2$ is approximately 6.619.

The density function for the exponential random variable $X$ is $f(x)=\beta e^{-\beta x}$ where $\beta>0$ is the rate parameter. It can be shown that the raw moments of $X$ is:

$\displaystyle E(X^k)=\frac{k!}{\beta^k} \ \ \ \ \ \ \ \ \ \ \ k=1,2,3,\cdots$

Then the first three moments of $Y$ are:

$\displaystyle E(Y)=E(X^2)=\frac{2}{\beta^2}$

$\displaystyle E(Y^2)=E(X^4)=\frac{24}{\beta^4}$

$\displaystyle E(Y^3)=E(X^6)=\frac{720}{\beta^6}$

With the first two raw moments, calculate the variance of $Y$. Then compute $\gamma_1$ via formula (3).

$\displaystyle \gamma_1=\frac{\displaystyle \frac{720}{\beta^6}-3 \ \frac{2}{\beta^2} \ \frac{20}{\beta^4}-\frac{8}{\beta^6}}{\biggl( \displaystyle \frac{20}{\beta^4} \biggr)^{\frac{3}{2}}}=\frac{74}{5 \sqrt{5}}=6.61876$

Example 5Exponentiation
Raising a left skewed distribution to a positive power can produce a distribution that is less left skewed. The use of increasing exponents eventually produces a positively skewed distribution. Let $X$ be the beta random variable with $\alpha=5$ and $\beta=1$. The density function for $X$ is $f(x)=5x^4$ where $0. Using the formula shown in Example 2 above, the coefficient of skewness is

$\displaystyle \gamma_1=\frac{2(1-5) \sqrt{5+1+1}}{(5+1+2) \sqrt{5}}=-1.183215957$

We wish to calculate the coefficient of skewness for $X^2$. To do that, it will be helpful to have a formula for the raw moments of $X$. It is easy to verify that:

$\displaystyle E(X^k)=\frac{5}{5+k} \ \ \ \ \ \ \ \ \ k=1,2,3,\cdots$

The first three moments of $Y=X^2$ are:

$\displaystyle E(Y)=E(X^2)=\frac{5}{7}$

$\displaystyle E(Y^2)=E(X^4)=\frac{5}{9}$

$\displaystyle E(Y^3)=E(X^6)=\frac{5}{11}$

Via formula (3), the following is the coefficient of skewness for $Y=X^2$.

$\displaystyle \gamma_1=\frac{\displaystyle \frac{5}{11}-3 \ \frac{5}{7} \ \frac{20}{441}-\frac{125}{7^3}}{\biggl( \displaystyle \frac{20}{441} \biggr)^{\frac{3}{2}}}=\frac{-18}{11 \sqrt{5}}=-0.731804065$

In this example, squaring the beta distribution with skewness -1.1832 produces a distribution a negatively skewed distribution but with a smaller skew. Let’s raise $X$ to higher powers. The following shows the results:

$\displaystyle X^3 \ \ \ \ \ \ \gamma_1=\frac{-2 \sqrt{11}}{7 \sqrt{5}}=-0.423782771$

$\displaystyle X^4 \ \ \ \ \ \ \gamma_1=\frac{-2 \sqrt{13}}{17 \sqrt{5}}=-0.189700182$

$\displaystyle X^5 \ \ \ \ \ \ \gamma_1=6 \sqrt{2}=8.485281374$

Raising the beta distribution with $\alpha=5$ and $\beta=1$ to higher powers eventually produces a positively skewed distribution. This is an interesting example, though this observation probably should not be taken as a rule.

________________________________________________________________________

Counterexamples

All the examples discussed previously are good “textbook” examples in that they help build intuition on how skewness behaves in familiar distributions. However, it is also easy to take the wrong lessons from these examples. The above examples can serve as good introduction to the topic of skewness. It is also important to attempt to provide a caveat that some of the commonly drawn lessons are not appropriate in all circumstances.

As indicated earlier, one wrong lesson from Figure 1 and Figure 2 is that a density curve such as Figure 1 may suggest that “mode < median < mean" for a right skewed distribution and that Figure 2 may suggest that "mean < median < mode" for a left skewed distribution. In both Figure 1 and Figure 2, the mean is further out in the long tail than the median. In certain textbooks, these two observations are even stated as characterizations of right skew and left skew. Such a rule of thumb is easy to state and easy to apply. For some students, such rule provides a lot of clarity about how skewness should work. For such students, checking for skewness is simply a matter of finding the relative position of the mean and median (e.g. in such thinking, if mean is greater than the median, then it is a right skew).

Any discussion of skewness should point out that the simple rule described in the above paragraph, though useful in many situations, is imperfect and may not apply outside of certain familiar distributions. For a good discussion on this issue, see this article.

We highlight one example found in the article mentioned above. This example demonstrates a clear violation of the common misconception indicated above. The following is the density function of the example.

$\displaystyle f_p(x)=\left\{\begin{matrix} \displaystyle (1-p)\biggl(1+\frac{1-p}{2p} x \biggr)&\ \ \ \ \ \ \frac{-2p}{1-p} \le x \le 0 \\{\text{ }}& \\ (1-p) e^{-x}&\ \ \ \ \ \ x>0 \end{matrix}\right.$

where $0. To facilitate the discussion, let $X_p$ be the random variable whose PDF is $f_p(x)$ defined above. The above density function is a juxtaposition of a triangular density and an exponential density. This triangular-exponential distribution has positive coefficient of skewness when $0. Yet within this range for $p$, the mean can be made to be on either side of the median. We consider three cases where $p=0.7$, $p=0.6$ and $p=0.9$.

Example 6
First the case $p=0.7$.

$\displaystyle f_{0.7}(x)=\left\{\begin{matrix} \displaystyle 0.3\biggl(1+\frac{3}{14} x \biggr)&\ \ \ \ \ \ \frac{-14}{3} \le x \le 0 \\{\text{ }}& \\ 0.3 e^{-x}&\ \ \ \ \ \ x>0 \end{matrix}\right.$

The following is the graph of the density curve $f_{0.7}(x)$. The right tail is long since the exponential distribution is on the right side. However, the left side is heavier (with 70% of the weight on the triangle on the left side).

Figure 5

The following shows the results for the density function $f_{0.7}(x)$.

$\displaystyle E(X_{0.7})=\frac{-21.3}{27}=-0.7889$

$\displaystyle \text{median of } X_{0.7} = -0.722613478$

$\displaystyle E(X_{0.7}^2)=\frac{254.4}{81}=3.140740741$

$\displaystyle Var(X_{0.7})=\frac{1835.91}{3^6}$

$\displaystyle E(X_{0.7}^3)=\frac{-2152.2}{405}$

$\displaystyle \gamma_1=\frac{111906.63}{5 1835.91^{1.5}}=0.284517335$

The calculation confirms the positive skew (0.2845), which is a moderately strong positive skewness. Note that the mean is to the left of the median. Both the mean and median are to the left of the mode (at x = 0). In Figure 5, the right side is infinitely long, thus a positively skewed distribution (and is confirmed by the calculation of $\gamma_1$). According to the common notion of how right skew should work, the mean should be further out on the right tail. But this is not the case. The mean is further out on the left side than the median. The violation of the common conception of skewness can occur when one tail is long but the other side is heavier.

Example 7
Now the case $p=0.6$.

$\displaystyle f_{0.6}(x)=\left\{\begin{matrix} \displaystyle 0.4\biggl(1+\frac{1}{3} x \biggr)&\ \ \ \ \ \ -3 \le x \le 0 \\{\text{ }}& \\ 0.4 e^{-x}&\ \ \ \ \ \ x>0 \end{matrix}\right.$

The following is the graph of the density curve $f_{0.6}(x)$. The right tail is long since the exponential distribution is on the right side. The left side is still heavy but a little less heavier than in the previous example (with 60% of the weight on the triangle on the left side).

Figure 6

The following shows the results for the density function $f_{0.6}(x)$.

$\displaystyle E(X_{0.6})=-0.2$

$\displaystyle \text{median of } X_{0.6} = -0.261387212$

$\displaystyle E(X_{0.6}^2)=1.7$

$\displaystyle Var(X_{0.6})=1.66$

$\displaystyle E(X_{0.6}^3)=0.78$

$\displaystyle \gamma_1=0.834128035$

The density curve $f_{0.6}(x)$ has a stronger positive skew than the previous example as there is a little more weight on the exponential side (the right side). Even though the mean in this case is to the right of the median, both the mean and median are not on the right tail but on the left triangular side (the heavier side). In any case, the mean is definitely not further out on the longer tail (the right tail) as the common rule of thumb would suggest.

Both Example 6 and Example 7 are right skewed distributions that do not conform to the common expectation about right skewed distributions. The following example will dispel the notion about the direction of the skew.

Example 8
Here we use $p=0.9$ so that there is a still a long right tail but 90% of the weight is on the other side.

$\displaystyle f_{0.9}(x)=\left\{\begin{matrix} \displaystyle 0.1\biggl(1+\frac{1}{18} x \biggr)&\ \ \ \ \ \ -18 \le x \le 0 \\{\text{ }}& \\ 0.1 e^{-x}&\ \ \ \ \ \ x>0 \end{matrix}\right.$

The overall shape of the $f_{0.9}(x)$ is similar to Figure 5 and Figure 6. The following shows the results for the density function $f_{0.9}(x)$.

$\displaystyle E(X_{0.9})=-5.3$

$\displaystyle E(X_{0.9}^2)=48.8$

$\displaystyle Var(X_{0.9})=20.71$

$\displaystyle E(X_{0.9}^3)=-524.28$

$\displaystyle \gamma_1=-0.489285839$

Because there is so little weight on the right tail, the skewness is actually negative (-0.48928). Here we have a right skewed looking distribution that is actually skewed to the left!

________________________________________________________________________

Remarks

Examples 5 through 7 demonstrate that when one tail is long but the other side is heavy, the common conception of right skew and left skew do not apply. The common conception, as discussed earlier, is that the both the mean and the median are located in the longer tail and that the mean is further out in the long tail than the median. The article mentioned earlier is easy to read and gives a fuller discussion of the issues when dealing with the notion of skewness. The common conception of skewness can be easily violated in discrete distributions, especially when the weights on both sides of the median are not equal. All the above examples are unimodal distributions. According the quoted article, bimodal or multimodal distributions can be problematic too.

Of course, the caveat presented here is not meant to discourage anyone from discussing the common conception about skewness. The common conception about the locations of mode, mean and median conveys useful intuition and we should continue to focus on it. But the common rule of thumb should definitely be not be presented as gospel truth as some textbooks had done. Instead, it should be pointed out that the common rule of thumb is imperfect and it would be helpful to have a discussion why the rule is imperfect.

________________________________________________________________________

Practice problems

Practice problems to reinforce the calculation are found in the companion blog to this blog.

________________________________________________________________________
$\copyright \ \text{2015 by Dan Ma}$