# The order statistics and the uniform distribution

In this post, we show that the order statistics of the uniform distribution on the unit interval are distributed according to the beta distributions. This leads to a discussion on estimation of percentiles using order statistics. We also present an example of using order statistics to construct confidence intervals of population percentiles. For a discussion on the distributions of order statistics of random samples drawn from a continuous distribution, see the previous post The distributions of the order statistics.

Suppose that we have a random sample of size $n$ from a continuous distribution with common distribution function $F_X(x)=F(x)$ and common density function $f_X(x)=f(x)$. The order statistics $Y_1 are obtained by ordering the sample in ascending order. In other words, $Y_1$ is the smallest item in the sample and $Y_2$ is the second smallest item in the sample and so on. Since this is random sampling from a continuous distribution, we assume that the probability of a tie between two order statistics is zero. In the previous post The distributions of the order statistics, we derive the probability density function of the $i^{th}$ order statistic:

$\displaystyle f_{Y_i}(y)=\frac{n!}{(i-1)! (n-i)!} \thinspace F(y)^{i-1} \thinspace [1-F(y)]^{n-i} f(y)$

The Order Statistics of the Uniform Distribution
Suppose that the random sample $X_1,X_2, \cdots, X_n$ are drawn from $U(0,1)$. Since the distribution function of $U(0,1)$ is $F(y)=y$ where $0, the probability density function of the $i^{th}$ order statistic is:

$\displaystyle f_{Y_i}(y)=\frac{n!}{(i-1)! (n-i)!} \thinspace y^{i-1} \thinspace [1-y]^{n-i}$ where $0.

The above density function is from the family of beta distributions. In general, the pdf of a beta distribution and its mean and variance are:

$\displaystyle f_{W}(w)=\frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \thinspace w^{a-1} \thinspace [1-w]^{b-1}$ where $0 where $\Gamma(\cdot)$ is the gamma function.

$\displaystyle E[W]=\frac{a}{a+b}$

$\displaystyle Var[W]=\frac{ab}{(a+b)^2 (a+b+1)}$

Then, the following shows the pdf of the $i^{th}$ order statistic of the uniform distribution on the unit interval and its mean and variance:

$\displaystyle f_{Y_i}(y)=\frac{\Gamma(n+1)}{\Gamma(i) \Gamma(n-i+1)} \thinspace y^{i-1} \thinspace [1-y]^{(n-i+1)-1}$ where $0.

$\displaystyle E[Y_i]=\frac{i}{i+(n-i+1)}=\frac{i}{n+1}$

$\displaystyle Var[Y_i]=\frac{i(n-i+1)}{(n+1)^2 (n+2)}$

Estimation of Percentiles
In descriptive statistics, we define the sample percentiles using the order statistics (even though the term order statistics may not be used in a non-calculus based introductory statistics course). For example, if sample size is an odd integer $n=2m+1$, then the sample median is the order statistic $Y_{m+1}$. The preceding discussion on the order statistics of the uniform distribution can show us that this approach is a sound one.

Suppose we have a random sample of size $n$ from an arbitrary continuous distribution. The order statistics listed in ascending order are:

$\displaystyle Y_1

For each $i \le n$, consider $W_i=F(Y_i)$. Since the distribution function $F(x)$ is a non-decreasing function, the $W_i$ are also increasing:

$\displaystyle W_1

It can be shown that if $F(x)$ is a distribution function of a continuous random variable $X$, then the transformation $F(X)$ follows the uniform distribution $U(0,1)$. Then each $W_i \sim U(0,1)$ and the $W_i$ are the order statistics of the following transformed random sample:

$\displaystyle F(X_1),F(X_2), \cdots, F(X_n)$

By the preceding discussion, $\displaystyle E[W_i]=E[F(Y_i)]=\frac{i}{n+1}$. Note that $F(Y_i)$ is the area under the density function $f(x)$ and to the left of $Y_i$. Thus $F(Y_i)$ is a random area and $E[W_i]=E[F(Y_i)]$ is the expected area under the density curve $f(x)$ to the left of $Y_i$.

For example, suppose the sample size $n$ is an odd integer where $n=2m+1$. Then the sample median is $Y_{m+1}$. Note that $\displaystyle E[W_{m+1}]=\frac{m+1}{n+1}=\frac{1}{2}$. Thus if we choose $Y_{m+1}$ as a point estimate for the population median, $Y_{m+1}$ is expected to be above 50% of the population.

Furthermore, $E[W_i - W_{i-1}]$ is the expected area under the density curve and between $Y_i$ and $Y_{i-1}$. This expected area is:

$\displaystyle E[W_i - W_{i-1}]=E[F(Y_i)]-E[F(Y_{i-1})]=\frac{i}{n+1}-\frac{i-1}{n+1}=\frac{1}{n+1}$

The expected area under the density curve and above the maximum order statistic $Y_n$ is:

$\displaystyle E[1-F(Y_n)]=1-\frac{n}{n+1}=\frac{1}{n+1}$

Consequently we have an interesting observation about the order statistics $Y_1. The order statistics $Y_i$ divides the the area under the density curve $f(x)$ and above the x-axis into $n+1$ areas. On average each of these area is $\displaystyle \frac{1}{n+1}$.

As a result, it makes sense to use order statistics as estimator of percentiles. For example, we can use $Y_i$ as the $(100p)^{th}$ percentile of the sample where $\displaystyle p=\frac{i}{n+1}$. Then $Y_i$ is an estimator of the population percentile $\tau_{p}$ where the area under the density curve $f(x)$ and to the left of $\tau_{p}$ is $p$. In the case that $(n+1)p$ is not an integer, then we interpolate between two order statistics. For example, if $(n+1)p=5.7$, then we interpolate between $Y_5$ and $Y_6$.

Example
Suppose we have a random sample of size $n=11$ drawn from a continuous distribution. Find estimators for the median, first quartile and second quartile. Find an estimate for the $85^{th}$ percentile. Construct an 87% confidence interval for the $40^{th}$ percentile.

The estimator for the median is $Y_6$. The estimator for the first quartile ($25^{th}$ percentile) is $Y_3$. The estimator for the second quartile ($75^{th}$ percentile) is $Y_9$. Based on the preceding discussion, the expected area under the density curve $f(x)$ to the left of $Y_3,Y_6,Y_9$ are 0.25, 0.5 and 0.75, respectively.

To find the $85^{th}$ percentile, note that $(n+1)p=12(0.85)=10.2$. Thus we interpolate $Y_{10}$ and $Y_{11}$. In our example, we use linear interpolation, though taking the arithmetic average of $Y_{10}$ and $Y_{11}$ is also a valid approach. The following is an estimate of the $85^{th}$ percentile.

$\displaystyle \hat{\tau}_{0.85}=0.8Y_{10}+0.2Y_{11}$

To find the confidence interval, consider the probability $P[Y_2 < \tau_{0.4} < Y_7]$ where $\tau_{0.4}$ is the $40^{th}$ percentile. Consider the event $X \le \tau_{0.4}$ as a success with probability of success $p=0.4$. For $Y_2 < \tau_{0.4} < Y_7$ to happen, there must be at least 2 successes and fewer than 7 success in the binomial distribution with $n=11$ and $p=0.4$. Thus we have:

$\displaystyle P[Y_2 < \tau_{0.4} < Y_7]=\sum \limits_{j=2}^{6} \binom{11}{j} 0.4^{j} 0.6^{11-j}=0.8704$

Thus the interval $(Y_2,Y_7)$ can be taken as the 87% confidence interval for $\tau_{0.4}$. This is an example of a distribution-free confidence interval because nothing is assumed about the underlying distribution in the construction of the confidence interval.

## 7 thoughts on “The order statistics and the uniform distribution”

1. Hi,

For me, as a beginner in mathematical statistics, usefull stuff and good explanation. Only the definition of the Beta function is incorrect. Took me an hour of frustratively computing before I figured out that you just forgot a w^(-1) in the beta function ;)

Peter

2. Peter

I am glad you found the post useful with good explanation. Thanks for pointing out the incorrect definition of beta distribution, which has been corrected.

Dan

3. most of the examples on the distribution of the sample median from a U(0,1) considers an odd sample size. I tried obtaining the distribution of the sample median from a U(0,1) when n=4. I got the joint distribution of X(2) and X(3) and tried transforming it to w=X(3) and m=(X(2)+X(3))/2 using the Jacobian technique and integrating out w to get the marginal of m. However, when I tried to verify if the density of m is really a pdf, it integrates to zero. Do you have the correct procedure for this?

4. It is interesting that the distribution for order statistics is the same family as the original distribution because uniform distribution is a private case of a beta distribution. Do we know necessary and sufficient conditions for which distributions the transformation from the original distribution to the distribution of the order statistics maps into itself, meaning if you start with one family you get the same family. For example, I guess the distribution for the order statistics of the normal distribution is not a normal distribution.

Thanks for a wonderful post.

Michael

5. Reblogged this on Samchappelle's Blog and commented:
Great post about order statistics and their importance in non-parametric methods.