The order statistics and the uniform distribution

In this post, we show that the order statistics of the uniform distribution on the unit interval are distributed according to the beta distributions. This leads to a discussion on estimation of percentiles using order statistics. We also present an example of using order statistics to construct confidence intervals of population percentiles. For a discussion on the distributions of order statistics of random samples drawn from a continuous distribution, see the previous post The distributions of the order statistics.

Suppose that we have a random sample of size n from a continuous distribution with common distribution function F_X(x)=F(x) and common density function f_X(x)=f(x). The order statistics Y_1<Y_2< \cdots <Y_n are obtained by ordering the sample in ascending order. In other words, Y_1 is the smallest item in the sample and Y_2 is the second smallest item in the sample and so on. Since this is random sampling from a continuous distribution, we assume that the probability of a tie between two order statistics is zero. In the previous post The distributions of the order statistics, we derive the probability density function of the i^{th} order statistic:

\displaystyle f_{Y_i}(y)=\frac{n!}{(i-1)! (n-i)!} \thinspace F(y)^{i-1} \thinspace [1-F(y)]^{n-i} f(y)

The Order Statistics of the Uniform Distribution
Suppose that the random sample X_1,X_2, \cdots, X_n are drawn from U(0,1). Since the distribution function of U(0,1) is F(y)=y where 0<y<1, the probability density function of the i^{th} order statistic is:

\displaystyle f_{Y_i}(y)=\frac{n!}{(i-1)! (n-i)!} \thinspace y^{i-1} \thinspace [1-y]^{n-i} where 0<y<1.

The above density function is from the family of beta distributions. In general, the pdf of a beta distribution and its mean and variance are:

\displaystyle f_{W}(w)=\frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \thinspace w^{a-1} \thinspace [1-w]^{b-1} where 0<w<1 where \Gamma(\cdot) is the gamma function.

\displaystyle E[W]=\frac{a}{a+b}

\displaystyle Var[W]=\frac{ab}{(a+b)^2 (a+b+1)}

Then, the following shows the pdf of the i^{th} order statistic of the uniform distribution on the unit interval and its mean and variance:

\displaystyle f_{Y_i}(y)=\frac{\Gamma(n+1)}{\Gamma(i) \Gamma(n-i+1)} \thinspace y^{i-1} \thinspace [1-y]^{(n-i+1)-1} where 0<y<1.

\displaystyle E[Y_i]=\frac{i}{i+(n-i+1)}=\frac{i}{n+1}

\displaystyle Var[Y_i]=\frac{i(n-i+1)}{(n+1)^2 (n+2)}

Estimation of Percentiles
In descriptive statistics, we define the sample percentiles using the order statistics (even though the term order statistics may not be used in a non-calculus based introductory statistics course). For example, if sample size is an odd integer n=2m+1, then the sample median is the order statistic Y_{m+1}. The preceding discussion on the order statistics of the uniform distribution can show us that this approach is a sound one.

Suppose we have a random sample of size n from an arbitrary continuous distribution. The order statistics listed in ascending order are:

\displaystyle Y_1<Y_2<Y_3< \cdots <Y_n

For each i \le n, consider W_i=F(Y_i). Since the distribution function F(x) is a non-decreasing function, the W_i are also increasing:

\displaystyle W_1<W_2<W_3< \cdots <W_n

It can be shown that if F(x) is a distribution function of a continuous random variable X, then the transformation F(X) follows the uniform distribution U(0,1). Then each W_i \sim U(0,1) and the W_i are the order statistics of the following transformed random sample:

\displaystyle F(X_1),F(X_2), \cdots, F(X_n)

By the preceding discussion, \displaystyle E[W_i]=E[F(Y_i)]=\frac{i}{n+1}. Note that F(Y_i) is the area under the density function f(x) and to the left of Y_i. Thus F(Y_i) is a random area and E[W_i]=E[F(Y_i)] is the expected area under the density curve f(x) to the left of Y_i.

For example, suppose the sample size n is an odd integer where n=2m+1. Then the sample median is Y_{m+1}. Note that \displaystyle E[W_{m+1}]=\frac{m+1}{n+1}=\frac{1}{2}. Thus if we choose Y_{m+1} as a point estimate for the population median, Y_{m+1} is expected to be above 50% of the population.

Furthermore, E[W_i - W_{i-1}] is the expected area under the density curve and between Y_i and Y_{i-1}. This expected area is:

\displaystyle E[W_i - W_{i-1}]=E[F(Y_i)]-E[F(Y_{i-1})]=\frac{i}{n+1}-\frac{i-1}{n+1}=\frac{1}{n+1}

The expected area under the density curve and above the maximum order statistic Y_n is:

\displaystyle E[1-F(Y_n)]=1-\frac{n}{n+1}=\frac{1}{n+1}

Consequently we have an interesting observation about the order statistics Y_1<Y_2<Y_3< \cdots <Y_n. The order statistics Y_i divides the the area under the density curve f(x) and above the x-axis into n+1 areas. On average each of these area is \displaystyle \frac{1}{n+1}.

As a result, it makes sense to use order statistics as estimator of percentiles. For example, we can use Y_i as the (100p)^{th} percentile of the sample where \displaystyle p=\frac{i}{n+1}. Then Y_i is an estimator of the population percentile \tau_{p} where the area under the density curve f(x) and to the left of \tau_{p} is p. In the case that (n+1)p is not an integer, then we interpolate between two order statistics. For example, if (n+1)p=5.7, then we interpolate between Y_5 and Y_6.

Example
Suppose we have a random sample of size n=11 drawn from a continuous distribution. Find estimators for the median, first quartile and second quartile. Find an estimate for the 85^{th} percentile. Construct an 87% confidence interval for the 40^{th} percentile.

The estimator for the median is Y_6. The estimator for the first quartile (25^{th} percentile) is Y_3. The estimator for the second quartile (75^{th} percentile) is Y_9. Based on the preceding discussion, the expected area under the density curve f(x) to the left of Y_3,Y_6,Y_9 are 0.25, 0.5 and 0.75, respectively.

To find the 85^{th} percentile, note that (n+1)p=12(0.85)=10.2. Thus we interpolate Y_{10} and Y_{11}. In our example, we use linear interpolation, though taking the arithmetic average of Y_{10} and Y_{11} is also a valid approach. The following is an estimate of the 85^{th} percentile.

\displaystyle \hat{\tau}_{0.85}=0.8Y_{10}+0.2Y_{11}

To find the confidence interval, consider the probability P[Y_2 < \tau_{0.4} < Y_7] where \tau_{0.4} is the 40^{th} percentile. Consider the event X \le \tau_{0.4} as a success with probability of success p=0.4. For Y_2 < \tau_{0.4} < Y_7 to happen, there must be at least 2 successes and fewer than 7 success in the binomial distribution with n=11 and p=0.4. Thus we have:

\displaystyle P[Y_2 < \tau_{0.4} < Y_7]=\sum \limits_{j=2}^{6} \binom{11}{j} 0.4^{j} 0.6^{11-j}=0.8704

Thus the interval (Y_2,Y_7) can be taken as the 87% confidence interval for \tau_{0.4}. This is an example of a distribution-free confidence interval because nothing is assumed about the underlying distribution in the construction of the confidence interval.

About these ads

7 thoughts on “The order statistics and the uniform distribution

  1. Hi,

    For me, as a beginner in mathematical statistics, usefull stuff and good explanation. Only the definition of the Beta function is incorrect. Took me an hour of frustratively computing before I figured out that you just forgot a w^(-1) in the beta function ;)

    Peter

  2. Peter

    I am glad you found the post useful with good explanation. Thanks for pointing out the incorrect definition of beta distribution, which has been corrected.

    Dan

  3. most of the examples on the distribution of the sample median from a U(0,1) considers an odd sample size. I tried obtaining the distribution of the sample median from a U(0,1) when n=4. I got the joint distribution of X(2) and X(3) and tried transforming it to w=X(3) and m=(X(2)+X(3))/2 using the Jacobian technique and integrating out w to get the marginal of m. However, when I tried to verify if the density of m is really a pdf, it integrates to zero. Do you have the correct procedure for this?

  4. Pingback: The German Tank Problem | Ross's Blog

  5. It is interesting that the distribution for order statistics is the same family as the original distribution because uniform distribution is a private case of a beta distribution. Do we know necessary and sufficient conditions for which distributions the transformation from the original distribution to the distribution of the order statistics maps into itself, meaning if you start with one family you get the same family. For example, I guess the distribution for the order statistics of the normal distribution is not a normal distribution.

    Thanks for a wonderful post.

    Michael

  6. Pingback: Expected value of an order statistic - Math Help Forum

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s