# The skewness of a probability distribution

In this post, we discuss how to calculate the moment coefficient of skewness and also discuss some issues surrounding the notion of skewness.

________________________________________________________________________

Looking at graphs

One informal but useful way of checking the skewness of a distribution is to look at the density curve (or a histogram). Consider the following density functions.

Figure 1

Figure 2

The density curve in Figure 1 has a longer tail to the right than to the left. The example in Figure 1 is a distribution that is skewed to the right. It is also said to be positively skewed since its coefficient of skewness is positive. The density curve in Figure 2 has a longer tail to the left than to the right. The example in Figure 2 is a distribution that is skewed to the left. It is also said to be negatively skewed since the skewness coefficient is negative. If a density curve looks the same to the left and to the right (such as the bell curve for the normal distribution), then it is a symmetric distribution and the skewness coefficient is zero.

The distribution in Figure 1 is a right skewed distribution (the longer tail is on the right). It is a gamma distribution with mean 2 and median approximately 1.678347. The mode (the highest peak) is at x = 1. The distribution in Figure 2 is a left skewed distribution (the longer tail is on the left) with mean and median approximately 0.909 and 0.9213562, respectively. The mode is at 0.95.

________________________________________________________________________

In the distribution for Figure 1, we can say that “mode < median < mean". In the distribution for Figure 2, we can say that "mean < median < mode". A common conception is that these simple rules characterize all skewed distribution, i.e., the mean is to the right of the median, which in turn is to the right of the mode in a right skewed distribution and that the mean is to the left of the median, which in turn is to the left of the mode in a left skewed distribution. Such rules are certainly easy to remember and are stated in some statistics textbooks. In the above two figures, this rule of thumb is certainly true. It turns out that this rule of thumb does not hold in many instances. The above two graphs are "textbook" demonstrations of skewness. They are gamma distributions and beta distributions and they behave well according to the usual notion of how skewed distributions should look like. In a later section of this post, we will discuss this issue in greater details. First we define the coefficient of skewness.

________________________________________________________________________

Pearson moment coefficient of skewness

The measure of skewness defined here is called the Pearson moment coefficient of skewness. This measure provides information about the amount and direction of the departure from symmetry. Its value can be positive or negative, or even undefined. The higher the absolute value of the skewness measure, the more asymmetric the distribution. The skewness measure of symmetric distributions is, or near, zero.

To help put the definition of skewness in context, we first define raw moments and central moments of a random variable $X$. The $k$th raw moment of $X$ is $E(X^k)$, the expected value of the $k$th power of the random variable $X$. The first raw moment is the mean of the random variable and is usually denoted by $\mu$.

The $k$th central moment of a random variable $X$ is $E[(X-\mu)^k]$, the expected value of the $k$th power of the deviation of the variable from its mean. The moment $E[(X-\mu)^k]$ is usually denoted by $\mu_k$. The second central moment is usually called the variance and is denoted by $\sigma^2$. The square root of $\sigma^2$, $\sigma$, is the standard deviation.

The ratio of the standard deviation to the mean, $\displaystyle \frac{\sigma}{\mu}$, is called the coefficient of variation.

The ratio of the third central moment to the cube of the standard deviation is called Pearson’s moment coefficient of skewness (or the coefficient of skewness) and is denoted by $\gamma_1$.

$\displaystyle \gamma_1=\frac{E[ (X-\mu)^3 ]}{\sigma^3}=\frac{\mu_3}{\sigma^3} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

The skewness in (1) can be expanded to derive a version that can be calculated more easily:

\displaystyle \begin{aligned} \gamma_1&=\frac{E[ (X-\mu)^3 ]}{\sigma^3} \\&=\frac{E(X^3)-3 \mu E(X^2)+3 \mu^2 E(X)-\mu^3}{\sigma^3} \\&=\frac{E(X^3)-3 \mu [E(X^2)-\mu E(X)]-\mu^3}{\sigma^3} \\&=\frac{E(X^3)-3 \mu \sigma^2-\mu^3}{\sigma^3} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2) \\&=\frac{E(X^3)-3 \mu \sigma^2-\mu^3}{(\sigma^2)^{\frac{3}{2}}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3) \end{aligned}

The last version (3) is in terms of the first raw moment $\mu$, the second central moment $\sigma^2$ and the third raw moment $E(X^3)$. Essentially, the coefficient $\gamma_1$ can be obtained via (3) by first computing the first three raw moments.

Even though kurtosis is not the focus of this post, we would like to state it to complete the the brief discussion on moments. The ratio of the fourth central moment to the fourth power of the standard deviation, $\displaystyle \gamma_2=\frac{\mu_4}{\sigma^4}$, is called the kurtosis.

________________________________________________________________________

Examples

In this section, we discuss the skewness in two familiar families of continuous distributions – gamma and beta. We also demonstrate how exponentiation can affect skewness.

Example 1Gamma Distribution
The following is the probability density function of the gamma distribution.

$\displaystyle f(x)=\frac{\beta^\alpha}{\Gamma(\alpha)} \ x^{\alpha-1} \ e^{-\beta x} \ \ \ \ \ \ \ \ \ x>0$

where $\Gamma(\cdot)$ is the gamma function, and $\alpha$ and $\beta$ are parameters such that $\alpha>0$ and $\beta>0$. The number $\alpha$ is the shape parameter and the number $\beta$ here is the rate parameter. Figure 1 shows the gamma distribution with $\alpha=2$ and $\beta=1$. When $\alpha=1$, we obtain the exponential distribution. When $\beta=\frac{1}{2}$ and $\alpha=\frac{k}{2}$ where $k$ is a positive integer, we obtain the chi square distribution with $k$ degrees of freedom.

Let $X$ be a random variable with the above gamma density function. The raw moments $E(X^k)$, where $k=1,2,3,\cdots$, are:

$\displaystyle E(X^k)=\frac{(\alpha+k-1)(\alpha+k-2) \cdots \alpha}{\beta^k}$

Using the first two raw moments to calculate the variance as well as the third moment, the following calculates the moment coefficient of skewness, based on the form in (3):

$\displaystyle \gamma_1=\frac{\displaystyle \frac{(\alpha+2)(\alpha+1)\alpha}{\beta^3}-3 \frac{\alpha}{\beta} \frac{\alpha}{\beta^3}-\frac{\alpha^3}{\beta^3}}{\biggl( \displaystyle \frac{\alpha}{\beta^2} \biggr)^{\frac{3}{2}}}=\frac{2}{\sqrt{\alpha}}$

The above calculation shows that the rate parameter $\beta$ has no effect on skewness. The example in Figure 1 has $\alpha=2$, giving a coefficient of skewness of $\sqrt{2}$ = 1.414213562. In general, the gamma distribution is skewed positively. However, the gamma distribution becomes more and more symmetric as the shape parameter $\alpha \rightarrow \infty$. The following graph the gamma densities for $\alpha=1, 2, 3, 5, 6$ and $\beta=1$.

Figure 3

In Figure 3, the light blue density with $\alpha=1$ is an exponential distribution. The red one with $\alpha=2$ is the density in Figure 1. With $\alpha=6$, the gamma density already looks very symmetric (the dark blue).

On the other hand, as the shape parameter $\alpha \rightarrow 0$, the gamma distribution becomes increasingly positively skewed. When $\alpha=\frac{1}{n}$, $\gamma_1=2 \sqrt{n}$. When $n \rightarrow \infty$, $\gamma_1 \rightarrow \infty$.

Example 2Beta Distribution
The following is the PDF of a beta distribution:

$\displaystyle f(x)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} \ x^{\alpha-1} \ (1-x)^{\beta-1} \ \ \ \ \ \ \ \ \ \ \ \ 0

where $\Gamma(\cdot)$ is the gamma function, and $\alpha$ and $\beta$ are parameters such that $\alpha>0$ and $\beta>0$. Both $\alpha$ and $\beta$ are shape parameters.

In the beta family of distributions, the skewness can range from positive to negative. If the $\alpha$ parameter dominates (i.e. $x$ is to a higher power and $1-x$ is to a small power in the density function), then the beta distribution has a negative skew (skewed to the left). This is because the function $x^n$ has left skew and the function $(1-x)^n$ has right skew. Then the skewness of the beta distribution follows the one that dominates. If the $\beta$ parameter dominates, the beta distribution is skewed to the right. If both parameters are roughly equal, the beta distribution is close to symmetric. For example when $\alpha=20$ and $\beta=2$, the beta distribution is left skewed (its density curve is in Figure 2). As in the gamma case, the skewness of the beta distribution has a close form. The following formula confirms the intuition about the skewness of the beta distribution (found here).

$\displaystyle \gamma_1=\frac{2(\beta-\alpha) \ \sqrt{\alpha+\beta+1}}{(\alpha+\beta+2) \ \sqrt{\alpha \ \beta}}$

Thus the beta distribution with $\alpha=20$ and $\beta=2$ has skewness coefficient -1.137431317. The following figure further demonstrates the role of the shape parameters play in changing the skewness of the beta distribution.

Figure 4

In Figure 4, as the $\alpha$ parameter goes from 2 to 20, the skewness goes from 1.137431317 to 0.659393193 to 0 to -0.659393193 to -1.137431317.

Example 3Exponentiation
Symmetric distributions have zero coefficient of skewness. Raising a symmetric distribution to a positive power can produce a skewed distribution. For example, let $X$ be the standard normal random variable (mean 0 and variance 1). Let $Y=X^2$. Then $Y$ has a chi-square distribution with 1 degree of freedom, which means that it is a gamma distribution with $\alpha=\frac{1}{2}$ and $\beta=\frac{1}{2}$. According to Example 1 above, the skewness coefficient is $\frac{2}{\sqrt{0.5}}=2 \sqrt{2}=2.828$. Thus squaring a standard normal distribution produces a very strongly positively skewed distribution.

Example 4Exponentiation
When raising a positively skewed distribution to positive power can produce a more strongly positively skewed distribution. For example, let $X$ be an exponential random variable. Example 1 shows that exponential distributions have skewness coefficient 2. We show that the coefficient of skewness for $Y=X^2$ is approximately 6.619.

The density function for the exponential random variable $X$ is $f(x)=\beta e^{-\beta x}$ where $\beta>0$ is the rate parameter. It can be shown that the raw moments of $X$ is:

$\displaystyle E(X^k)=\frac{k!}{\beta^k} \ \ \ \ \ \ \ \ \ \ \ k=1,2,3,\cdots$

Then the first three moments of $Y$ are:

$\displaystyle E(Y)=E(X^2)=\frac{2}{\beta^2}$

$\displaystyle E(Y^2)=E(X^4)=\frac{24}{\beta^4}$

$\displaystyle E(Y^3)=E(X^6)=\frac{720}{\beta^6}$

With the first two raw moments, calculate the variance of $Y$. Then compute $\gamma_1$ via formula (3).

$\displaystyle \gamma_1=\frac{\displaystyle \frac{720}{\beta^6}-3 \ \frac{2}{\beta^2} \ \frac{20}{\beta^4}-\frac{8}{\beta^6}}{\biggl( \displaystyle \frac{20}{\beta^4} \biggr)^{\frac{3}{2}}}=\frac{74}{5 \sqrt{5}}=6.61876$

Example 5Exponentiation
Raising a left skewed distribution to a positive power can produce a distribution that is less left skewed. The use of increasing exponents eventually produces a positively skewed distribution. Let $X$ be the beta random variable with $\alpha=5$ and $\beta=1$. The density function for $X$ is $f(x)=5x^4$ where $0. Using the formula shown in Example 2 above, the coefficient of skewness is

$\displaystyle \gamma_1=\frac{2(1-5) \sqrt{5+1+1}}{(5+1+2) \sqrt{5}}=-1.183215957$

We wish to calculate the coefficient of skewness for $X^2$. To do that, it will be helpful to have a formula for the raw moments of $X$. It is easy to verify that:

$\displaystyle E(X^k)=\frac{5}{5+k} \ \ \ \ \ \ \ \ \ k=1,2,3,\cdots$

The first three moments of $Y=X^2$ are:

$\displaystyle E(Y)=E(X^2)=\frac{5}{7}$

$\displaystyle E(Y^2)=E(X^4)=\frac{5}{9}$

$\displaystyle E(Y^3)=E(X^6)=\frac{5}{11}$

Via formula (3), the following is the coefficient of skewness for $Y=X^2$.

$\displaystyle \gamma_1=\frac{\displaystyle \frac{5}{11}-3 \ \frac{5}{7} \ \frac{20}{441}-\frac{125}{7^3}}{\biggl( \displaystyle \frac{20}{441} \biggr)^{\frac{3}{2}}}=\frac{-18}{11 \sqrt{5}}=-0.731804065$

In this example, squaring the beta distribution with skewness -1.1832 produces a distribution a negatively skewed distribution but with a smaller skew. Let’s raise $X$ to higher powers. The following shows the results:

$\displaystyle X^3 \ \ \ \ \ \ \gamma_1=\frac{-2 \sqrt{11}}{7 \sqrt{5}}=-0.423782771$

$\displaystyle X^4 \ \ \ \ \ \ \gamma_1=\frac{-2 \sqrt{13}}{17 \sqrt{5}}=-0.189700182$

$\displaystyle X^5 \ \ \ \ \ \ \gamma_1=6 \sqrt{2}=8.485281374$

Raising the beta distribution with $\alpha=5$ and $\beta=1$ to higher powers eventually produces a positively skewed distribution. This is an interesting example, though this observation probably should not be taken as a rule.

________________________________________________________________________

Counterexamples

All the examples discussed previously are good “textbook” examples in that they help build intuition on how skewness behaves in familiar distributions. However, it is also easy to take the wrong lessons from these examples. The above examples can serve as good introduction to the topic of skewness. It is also important to attempt to provide a caveat that some of the commonly drawn lessons are not appropriate in all circumstances.

As indicated earlier, one wrong lesson from Figure 1 and Figure 2 is that a density curve such as Figure 1 may suggest that â€œmode < median < mean" for a right skewed distribution and that Figure 2 may suggest that "mean < median < mode" for a left skewed distribution. In both Figure 1 and Figure 2, the mean is further out in the long tail than the median. In certain textbooks, these two observations are even stated as characterizations of right skew and left skew. Such a rule of thumb is easy to state and easy to apply. For some students, such rule provides a lot of clarity about how skewness should work. For such students, checking for skewness is simply a matter of finding the relative position of the mean and median (e.g. in such thinking, if mean is greater than the median, then it is a right skew).

Any discussion of skewness should point out that the simple rule described in the above paragraph, though useful in many situations, is imperfect and may not apply outside of certain familiar distributions. For a good discussion on this issue, see this article.

We highlight one example found in the article mentioned above. This example demonstrates a clear violation of the common misconception indicated above. The following is the density function of the example.

$\displaystyle f_p(x)=\left\{\begin{matrix} \displaystyle (1-p)\biggl(1+\frac{1-p}{2p} x \biggr)&\ \ \ \ \ \ \frac{-2p}{1-p} \le x \le 0 \\{\text{ }}& \\ (1-p) e^{-x}&\ \ \ \ \ \ x>0 \end{matrix}\right.$

where $0. To facilitate the discussion, let $X_p$ be the random variable whose PDF is $f_p(x)$ defined above. The above density function is a juxtaposition of a triangular density and an exponential density. This triangular-exponential distribution has positive coefficient of skewness when $0. Yet within this range for $p$, the mean can be made to be on either side of the median. We consider three cases where $p=0.7$, $p=0.6$ and $p=0.9$.

Example 6
First the case $p=0.7$.

$\displaystyle f_{0.7}(x)=\left\{\begin{matrix} \displaystyle 0.3\biggl(1+\frac{3}{14} x \biggr)&\ \ \ \ \ \ \frac{-14}{3} \le x \le 0 \\{\text{ }}& \\ 0.3 e^{-x}&\ \ \ \ \ \ x>0 \end{matrix}\right.$

The following is the graph of the density curve $f_{0.7}(x)$. The right tail is long since the exponential distribution is on the right side. However, the left side is heavier (with 70% of the weight on the triangle on the left side).

Figure 5

The following shows the results for the density function $f_{0.7}(x)$.

$\displaystyle E(X_{0.7})=\frac{-21.3}{27}=-0.7889$

$\displaystyle \text{median of } X_{0.7} = -0.722613478$

$\displaystyle E(X_{0.7}^2)=\frac{254.4}{81}=3.140740741$

$\displaystyle Var(X_{0.7})=\frac{1835.91}{3^6}$

$\displaystyle E(X_{0.7}^3)=\frac{-2152.2}{405}$

$\displaystyle \gamma_1=\frac{111906.63}{5 1835.91^{1.5}}=0.284517335$

The calculation confirms the positive skew (0.2845), which is a moderately strong positive skewness. Note that the mean is to the left of the median. Both the mean and median are to the left of the mode (at x = 0). In Figure 5, the right side is infinitely long, thus a positively skewed distribution (and is confirmed by the calculation of $\gamma_1$). According to the common notion of how right skew should work, the mean should be further out on the right tail. But this is not the case. The mean is further out on the left side than the median. The violation of the common conception of skewness can occur when one tail is long but the other side is heavier.

Example 7
Now the case $p=0.6$.

$\displaystyle f_{0.6}(x)=\left\{\begin{matrix} \displaystyle 0.4\biggl(1+\frac{1}{3} x \biggr)&\ \ \ \ \ \ -3 \le x \le 0 \\{\text{ }}& \\ 0.4 e^{-x}&\ \ \ \ \ \ x>0 \end{matrix}\right.$

The following is the graph of the density curve $f_{0.6}(x)$. The right tail is long since the exponential distribution is on the right side. The left side is still heavy but a little less heavier than in the previous example (with 60% of the weight on the triangle on the left side).

Figure 6

The following shows the results for the density function $f_{0.6}(x)$.

$\displaystyle E(X_{0.6})=-0.2$

$\displaystyle \text{median of } X_{0.6} = -0.261387212$

$\displaystyle E(X_{0.6}^2)=1.7$

$\displaystyle Var(X_{0.6})=1.66$

$\displaystyle E(X_{0.6}^3)=0.78$

$\displaystyle \gamma_1=0.834128035$

The density curve $f_{0.6}(x)$ has a stronger positive skew than the previous example as there is a little more weight on the exponential side (the right side). Even though the mean in this case is to the right of the median, both the mean and median are not on the right tail but on the left triangular side (the heavier side). In any case, the mean is definitely not further out on the longer tail (the right tail) as the common rule of thumb would suggest.

Both Example 6 and Example 7 are right skewed distributions that do not conform to the common expectation about right skewed distributions. The following example will dispel the notion about the direction of the skew.

Example 8
Here we use $p=0.9$ so that there is a still a long right tail but 90% of the weight is on the other side.

$\displaystyle f_{0.9}(x)=\left\{\begin{matrix} \displaystyle 0.1\biggl(1+\frac{1}{18} x \biggr)&\ \ \ \ \ \ -18 \le x \le 0 \\{\text{ }}& \\ 0.1 e^{-x}&\ \ \ \ \ \ x>0 \end{matrix}\right.$

The overall shape of the $f_{0.9}(x)$ is similar to Figure 5 and Figure 6. The following shows the results for the density function $f_{0.9}(x)$.

$\displaystyle E(X_{0.9})=-5.3$

$\displaystyle E(X_{0.9}^2)=48.8$

$\displaystyle Var(X_{0.9})=20.71$

$\displaystyle E(X_{0.9}^3)=-524.28$

$\displaystyle \gamma_1=-0.489285839$

Because there is so little weight on the right tail, the skewness is actually negative (-0.48928). Here we have a right skewed looking distribution that is actually skewed to the left!

________________________________________________________________________

Remarks

Examples 5 through 7 demonstrate that when one tail is long but the other side is heavy, the common conception of right skew and left skew do not apply. The common conception, as discussed earlier, is that the both the mean and the median are located in the longer tail and that the mean is further out in the long tail than the median. The article mentioned earlier is easy to read and gives a fuller discussion of the issues when dealing with the notion of skewness. The common conception of skewness can be easily violated in discrete distributions, especially when the weights on both sides of the median are not equal. All the above examples are unimodal distributions. According the quoted article, bimodal or multimodal distributions can be problematic too.

Of course, the caveat presented here is not meant to discourage anyone from discussing the common conception about skewness. The common conception about the locations of mode, mean and median conveys useful intuition and we should continue to focus on it. But the common rule of thumb should definitely be not be presented as gospel truth as some textbooks had done. Instead, it should be pointed out that the common rule of thumb is imperfect and it would be helpful to have a discussion why the rule is imperfect.

________________________________________________________________________

Practice problems

Practice problems to reinforce the calculation are found in the companion blog to this blog.

________________________________________________________________________
$\copyright \ \text{2015 by Dan Ma}$

# Conditional Distributions, Part 2

We present more examples to further illustrate the thought process of conditional distributions. A conditional distribution is a probability distribution derived from a given probability distribution by focusing on a subset of the original sample space (we assume that the probability distribution being discussed is a model for some random experiment). The new sample space (the subset of the original one) may be some outcomes that are of interest to an experimenter in a random experiment or may reflect some new information we know about the random experiment in question. We illustrate this thought process in the previous post Conditional Distributions, Part 1 using discrete distributions. In this post, we present some continuous examples for conditional distributions. One concept illustrated by the examples in this post is the notion of mean residual life, which has an insurance interpretation (e.g. the average remaining time until death given that the life in question is alive at a certain age).

_____________________________________________________________________________________________________________________________

The Setting

The thought process of conditional distributions is discussed in the previous post Conditional Distributions, Part 1. We repeat the same discussion using continuous distributions.

Let $X$ be a continuous random variable that describes a certain random experiment. Let $S$ be the sample space of this random experiment. Let $f(x)$ be its probability density function.

We assume that $X$ is a univariate random variable, meaning that the sample space $S$ is the real line $\mathbb{R}$ or a subset of $\mathbb{R}$. Since $X$ is a continuous random variable, we know that $S$ would contain an interval, say, $(a,b)$.

Suppose that in the random experiment in question, certain event $A$ has occurred. The probability of the event $A$ is obtained by integrating the density function over the set $A$.

$\displaystyle P(A)=\int_{x \in A} f(x) \ dx$

Since the event $A$ has occurred, $P(A)>0$. Since we are dealing with a continuous distribution, the set $A$ would contain an interval, say $(c,d)$ (otherwise $P(A)=0$). So the new probability distribution we define is also a continuous distribution. The following is the density function defined on the new sample space $A$.

$\displaystyle f(x \lvert A)=\frac{f(x)}{P(A)}, \ \ \ \ \ \ \ \ \ x \in A$

The above probability distribution is called the conditional distribution of $X$ given the event $A$, denoted by $X \lvert A$. This new probability distribution incorporates new information about the results of a random experiment.

Once this new probability distribution is established, we can compute various distributional quantities (e.g. cumulative distribution function, mean, variance and other higher moments).

_____________________________________________________________________________________________________________________________

Examples

Example 1

Let $X$ be the lifetime (in years) of a brand new computer purchased from a certain manufacturer. Suppose that the following is the density function of the random variable $X$.

$\displaystyle f(x)=\frac{3}{2500} \ (100x-20x^2 + x^3), \ \ \ \ \ \ \ \ 0

Suppose that you have just purchased a one such computer that is 2-year old and in good working condition. We have the following questions.

• What is the expected lifetime of this 2-year old computer?
• What is the expected number of years of service that will be provided by this 2-year old computer?

Both calculations are conditional means since the computer in question already survived to age 2. However, there is a slight difference between the two calculations. The first one is the expected age of the 2-year old computer, i.e., the conditional mean $E(X \lvert X>2)$. The second one is the expected remaining lifetime of the 2-year old computer, i.e., $E(X-2 \lvert X>2)$.

For a brand new computer, the sample space is the interval $S=0. Knowing that the computer is already 2-year old, the new sample space is $A=2. The total probability of the new sample space is:

$\displaystyle P(A)=P(X>2)=\int_{2}^{10} \frac{3}{2500} \ (100x-20x^2 + x^3) \ dx=\frac{2048}{2500}=0.8192$

The conditional density function of $X$ given $X>2$ is:

\displaystyle \begin{aligned} f(x \lvert X>2)&=\frac{\frac{3}{2500} \ (100x-20x^2 + x^3)} {\frac{2048}{2500}} \\&=\frac{3}{2048} \ (100x-20x^2 + x^3), \ \ \ \ \ \ \ \ \ 2

The first conditional mean is:

\displaystyle \begin{aligned} E(X \lvert X>2)&=\int_2^{10} x \ f(x \lvert X>2) \ dx \\&=\int_2^{10} \frac{3}{2048} \ x(100x-20x^2 + x^3) \ dx \\&=\int_2^{10} \frac{3}{2048} \ (100x^2-20x^3 + x^4) \ dx \\&=\frac{47104}{10240}=4.6 \end{aligned}

The second conditional mean is:

$\displaystyle E(X-2 \lvert X>2)=E(X \lvert X>2)-2=2.6$

In contrast, the unconditional mean is:

$\displaystyle E(X)=\int_0^{10} \frac{3}{2500} \ (100x^2-20x^3 + x^4) \ dx=4$

So if the lifetime of a computer is modeled by the density function $f(x)$ given here, the expected lifetime of a brand new computer is 4 years. If you know that a computer has already been in use for 2 years and is in good condition, the expected lifetime is 4.6 years, where 2 years of which have already passed, showing us that the remaining lifetime is 2.6 years.

Note that the following calculation is not $E(X \lvert X>2)$, though is something that some students may attempt to do.

$\displaystyle \int_2^{10} x \ f(x) \ dx =\int_2^{10} \frac{3}{2500} \ x(100x-20x^2 + x^3) \ dx=\frac{47104}{12500}=3.76832$

The above calculation does not use the conditional distribution that $X>2$. Also note that the answer is less than the unconditional mean $E(X)$.

Example 2 – Exponential Distribution

Work Example 1 again by assuming that the lifetime of the type of computers in questions follows the exponential distribution with mean 4 years.

The following is the density function of the lifetime $X$.

$\displaystyle f(x)=0.25 \ e^{-0.25 x}, \ \ \ \ \ \ 0

The probability that the computer has survived to age 2 is:

$\displaystyle P(X>2)=\int_2^\infty 0.25 \ e^{-0.25 x} \ dx=e^{-0.25 (2)}=e^{-0.5}$

The conditional density function given that $X>2$ is:

$\displaystyle f(x \lvert X>2)= \frac{0.25 \ e^{-0.25 x}}{e^{-0.25 (2)}}=0.25 \ e^{-0.25 (x-2)}, \ \ \ \ \ \ \ 2

To compute the conditional mean $E(X \lvert X>2)$, we have

\displaystyle \begin{aligned} E(X \lvert X>2)&=\int_2^\infty x \ f(x \lvert X>2) \ dx \\&=\int_2^\infty 0.25 \ x \ e^{-0.25 (x-2)} \ dx \\&=\int_0^\infty 0.25 \ (u+2) \ e^{-0.25 u} \ du \ \ \ (\text{change of variable}) \\&=\int_0^\infty 0.25 \ u \ e^{-0.25 u} \ du+2\int_0^\infty 0.25 \ e^{-0.25 u} \ du \\&=\frac{1}{0.25}+2=4+2=6\end{aligned}

Then $\displaystyle E(X-2 \lvert X>2)=E(X \lvert X>2)-2=6-2=4$.

We have an interesting result here. The expected lifetime of a brand new computer is 4 years. Yet the remaining lifetime for a 2-year old computer is still 4 years! This is the no-memory property of the exponential distribution – if the lifetime of a type of machines is distributed according to an exponential distribution, it does not matter how old the machine is, the remaining lifetime is always the same as the unconditional mean! This point indicates that the exponential distribution is not an appropriate for modeling the lifetime of machines or biological lives that wear out over time.

_____________________________________________________________________________________________________________________________

Mean Residual Life

If a 40-year old man who is a non-smoker wants to purchase a life insurance policy, the insurance company is interested in knowing the expected remaining lifetime of the prospective policyholder. This information will help determine the pricing of the life insurance policy. The expected remaining lifetime of the prospective policyholder is called is called the mean residual life and is the conditional mean $E(X-t \lvert X>t)$ where $X$ is a model for the lifetime of some life.

In engineering and manufacturing applications, probability modeling of lifetimes of objects (e.g. devices, systems or machines) is known as reliability theory. The mean residual life also plays an important role in such applications.

Thus if the random variable $X$ is a lifetime model (lifetime of a life, system or device), then the conditional mean $E(X-t \lvert X>t)$ is called the mean residual life and is the expected remaining lifetime of the life or system in question given that the life has survived to age $t$.

On the other hand, if the random variable $X$ is a model of insurance losses, then the conditional mean $E(X-t \lvert X>t)$ is the expected claim payment per loss given that the loss has exceeded the deductible of $t$. In this interpretation, the conditional mean $E(X-t \lvert X>t)$ is called the mean excess loss function.

_____________________________________________________________________________________________________________________________

Summary

In conclusion, we summarize the approach for calculating the two conditional means demonstrated in the above examples.

Suppose $X$ is a continuous random variable with the support being $(0,\infty)$ (the positive real numbers), with $f(x)$ being the density function. The following is the density function of the conditional probability distribution given that $X>t$.

$\displaystyle f(x \lvert X>t)=\frac{f(x)}{P(X>t)}, \ \ \ \ \ \ \ \ \ x>t$

Then we have the two conditional means:

$\displaystyle E(X \lvert X>t)=\int_t^\infty x \ f(x \lvert X>t) \ dx=\int_t^\infty x \ \frac{f(x)}{P(X>t)} \ dx$

$\displaystyle E(X-t \lvert X>t)=\int_t^\infty (x-t) \ f(x \lvert X>t) \ dx=\int_t^\infty (x-t) \ \frac{f(x)}{P(X>t)} \ dx$

If $E(X \lvert X>t)$ is calculated first (or is easier to calculate), then $E(X-t \lvert X>t)=E(X \lvert X>t)-t$, as shown in the above examples.

If $X$ is a discrete random variable, then the integrals are replaced by summation symbols. As indicated above, the conditional mean $E(X-t \lvert X>t)$ is called the mean residual life when $X$ is a probability model of the lifetime of some system or life.

_____________________________________________________________________________________________________________________________

Practice Problems

Practice problems are found in the companion blog.

_____________________________________________________________________________________________________________________________

$\copyright \ \text{2013 by Dan Ma}$