# The distributions of the order statistics

Sample statistics such as sample median, sample quartiles and sample minimum and maximum play a prominent role in the analysis using empirical data (e.g. in descriptive statistics and exploratory data analysis (EDA)). In this post we discuss order statistics and their distributions. The order statistics are the items from a randon sample arranged in increasing order. The focus here is to present the distribution functions and probability density functions of order statistics. The order statistics are important tools in non-parametric statistical inferences. In subsequent posts, we will present examples of applications in non-parametric methods.

In this post, we only consider random samples obtained from a continuous distribution (i.e. the distribution function is a continuous function). Let $X_1,X_2, \cdots, X_n$ be a random sample of size $n$ from a continuous distribution with distribution function $F(x)$. We order the random sample in increasing order and obtain $Y_1,Y_2, \cdots, Y_n$. In other words, we have:

$Y_1=$ the smallest of $X_1,X_2, \cdots, X_n$
$Y_2=$ the second smallest of $X_1,X_2, \cdots, X_n$
$\cdot$
$\cdot$
$\cdot$
$Y_n=$ the largest of $X_1,X_2, \cdots, X_n$

We set $Y_{min}=Y_1$ and $Y_{max}=Y_n$. The order statistic $Y_i$ is called the $i^{th}$ order statistic. Since we are working with a continuous distribution, we assume that the probability of two sample items being equal is zero. Thus we can assume that $Y_1. That is, the probability of a tie is zero among the order statistics.

The Distribution Functions of the Order Statistics
The distribution function of $Y_i$ is an upper tail of a binomial distribution. If the event $Y_i \le y$ occurs, then there are at least $i$ many $X_j$ in the sample that are less than or equal to $y$. Consider the event that $X \le y$ as a success and $F(y)=P[X \le y]$ as the probability of success. Then the drawing of each sample item becomes a Bernoulli trial (a success or a failure). We are interested in the probability of having at least $i$ many successes. Thus the following is the distribution function of $Y_i$:

$\displaystyle F_{Y_i}(y)=P[Y_i \le y]=\sum \limits_{k=i}^{n} \binom{n}{k} F(y)^k [1-F(y)]^{n-k}\ \ \ \ \ \ \ \ \ \ \ \ (1)$

The following relationship is used in deriving the probability density function:

$\displaystyle F_{Y_i}(y)=F_{Y_{i-1}}(y)-\binom{n}{i-1} F(y)^{i-1} [1-F(y)]^{n-i+1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$

The Probability Density Functions of the Order Statistics
The probability density function of $Y_i$ is given by:

$\displaystyle f_{Y_i}(y)=\frac{n!}{(i-1)! (n-i)!} \thinspace F(y)^{i-1} \thinspace [1-F(y)]^{n-i} f_X(y) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)$

We prove this by induction. Consider $i=1$. Note that $F_{Y_1}(y)$ is the probability that at least one $X_j \le y$ and is the complement of the probability of having no $X_j \le y$. Thus $F_{Y_1}(y)=1-[1-F(y)]^n$. By taking derivative, we have:

$\displaystyle f_{Y_1}(y)=F_{Y_1}^{-1}(y)=n [1-F(y)]^{n-1} f_X(y)$

Suppose we derive the pdf of $Y_{i-1}$ using (3) and obtain the following:

$\displaystyle f_{Y_{i-1}}(y)=\frac{n!}{(i-2)! (n-i+1)!} \thinspace F(y)^{i-2} \thinspace [1-F(y)]^{n-i+1} f_X(y)$

Now we take the derivative of (2) above and we have:

$\displaystyle f_{Y_i}(y)=f_{Y_{i-1}}(y)-\biggl[(i-1)\binom{n}{i-1} F(y)^{i-2} f_X(y)[1-F(y)]^{n-i+1}$
$\displaystyle -\ \ \ \ \ \ \ \ \ \ \binom{n}{i-1}F(y)^{i-1}(n-i+1)[1-F(y)]^{n-i} f_X(y) \biggr]$

After simplifying the right hand side, we obtain the pdf of $Y$ as in (3).

Comments
We would like to make two comments. One is that in terms of problem solving, it may be better to rely on the distribution function in (1) above to derive the pdf. The thought process behind (1) is clear. The second is that the last three terms in the pdf in (3) are very instructive. Let’s arrange these three terms as follows:.

$\displaystyle F(y)^{i-1} \thinspace f_X(y) \thinspace [1-F(y)]^{n-i}$

Note that the first term is the probability that there are $i-1$ sample items below $y$. The middle term indicates that one sample item is right around $y$. The third term indicates that there are $n-i$ items above $y$. Thus the following multinomial probability is the pdf in (3):

$\displaystyle f_{Y_i}(y)=\frac{n!}{(i-1)! 1! (n-i)!} \thinspace F(y)^{i-1} \thinspace f_X(y) \thinspace [1-F(y)]^{n-i} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)$

This heuristic approach is further described here.

Example
Suppose that a sample of size $n=11$ is drawn from the uniform distribution on the interval $(0, \theta)$. Find the pdfs for $Y_{min}=Y_1$, $Y_{max}=Y_{11}$ and $Y_6$. Find $E[Y_6]$.

Let $X \sim uniform(0,\theta)$. The distribution function and pdf of $X$ are:

$\displaystyle F(y)=\left\{\begin{matrix}0&\thinspace y<0\\{\displaystyle \frac{y}{\theta}}&\thinspace 0 \le y < \theta\\{1}&\thinspace y \ge \theta\end{matrix}\right.$

$\displaystyle f(y)=\left\{\begin{matrix}\displaystyle \frac{1}{\theta}&\thinspace 0

Using (3), the following are the pdfs of $Y_1$, $Y_{11}$ and $Y_6$.

$\displaystyle f_{Y_1}(y)=\frac{11}{\theta^{11}} (\theta-y)^{10}$

$\displaystyle f_{Y_{11}}(y)=\frac{11}{\theta^{11}} y^{10}$

$\displaystyle f_{Y_6}(y)=2772 \biggl(\frac{y}{\theta}\biggr)^5 \biggl(1-\frac{y}{\theta}\biggr)^5 \frac{1}{\theta}$

In this example, $Y_6$ is the sample median and serves as a point estimate for the population median $\frac{\theta}{2}$. As an estimator of the median, we prefer $Y_6$ not to overestimate or underestimate $\frac{\theta}{2}$ (we call such estimator as unbiased estimator). In this particular example, the sample median $Y_6$ is an unbiased estimator of $\frac{\theta}{2}$. To see this we show $E[Y_6]=\frac{\theta}{2}$.

$\displaystyle E[Y_6]=\int_0^{\theta}2772 y \biggl(\frac{y}{\theta}\biggr)^5 \biggl(1-\frac{y}{\theta}\biggr)^5 \frac{1}{\theta} dy$

By substituting $w=\frac{y}{\theta}$, we have the following beta integral.

$\displaystyle E[Y_6]=2772 \theta \int_0^1 w^{7-1} (1-w)^{6-1} dw$

$\displaystyle E[Y_6]=2772 \theta \thinspace \frac{\Gamma(7) \Gamma(6)}{\Gamma(13)}=2772 \theta \thinspace \frac{6! \thinspace 5!}{12!}=\frac{\theta}{2}$

________________________________________________________________________

Practice problems

Practice problems are found here in a companion blog.

________________________________________________________________________
$\copyright \ \text{2010 - 2015 by Dan Ma}$ Revised April 6, 2015.

Advertisements

## 6 thoughts on “The distributions of the order statistics”

1. Thanks. Very nice, really clear and intuitive!!

2. Thank you so much.
Can you please provide formulas for the case that random variables are not identical distributed.