In this post, we show that the order statistics of the uniform distribution on the unit interval are distributed according to the beta distributions. This leads to a discussion on estimation of percentiles using order statistics. We also present an example of using order statistics to construct confidence intervals of population percentiles. For a discussion on the distributions of order statistics of random samples drawn from a continuous distribution, see the previous post The distributions of the order statistics.
Suppose that we have a random sample of size
from a continuous distribution with common distribution function
and common density function
. The order statistics
are obtained by ordering the sample
in ascending order. In other words,
is the smallest item in the sample and
is the second smallest item in the sample and so on. Since this is random sampling from a continuous distribution, we assume that the probability of a tie between two order statistics is zero. In the previous post The distributions of the order statistics, we derive the probability density function of the
order statistic:
The Order Statistics of the Uniform Distribution
Suppose that the random sample are drawn from
. Since the distribution function of
is
where
, the probability density function of the
order statistic is:
The above density function is from the family of beta distributions. In general, the pdf of a beta distribution and its mean and variance are:
Then, the following shows the pdf of the order statistic of the uniform distribution on the unit interval and its mean and variance:
Estimation of Percentiles
In descriptive statistics, we define the sample percentiles using the order statistics (even though the term order statistics may not be used in a non-calculus based introductory statistics course). For example, if sample size is an odd integer , then the sample median is the order statistic
. The preceding discussion on the order statistics of the uniform distribution can show us that this approach is a sound one.
Suppose we have a random sample of size from an arbitrary continuous distribution. The order statistics listed in ascending order are:
For each , consider
. Since the distribution function
is a non-decreasing function, the
are also increasing:
It can be shown that if is a distribution function of a continuous random variable
, then the transformation
follows the uniform distribution
. Then the following transformed random sample:
are drawn from the uniform distribution . Furthermore,
are the order statistics for this random sample. By the preceding discussion,
. Note that
is the area under the density function
and to the left of
. Thus
is a random area and
is the expected area under the density curve
to the left of
. Recall that
is the common density function of the original sample
.
For example, suppose the sample size is an odd integer where
. Then the sample median is
. Note that
. Thus if we choose
as a point estimate for the population median,
is expected to be above the bottom 50% of the population and is expected to be below the upper 50% of the population.
Furthermore, is the expected area under the density curve and between
and
. This expected area is:
The expected area under the density curve and above the maximum order statistic is:
Consequently here is an interesting observation about the order statistics . The order statistics
divides the the area under the density curve
and above the x-axis into
areas. On average each of these area is
.
As a result, it makes sense to use order statistics as estimator of percentiles. For example, we can use as the
percentile of the sample where
. Then
is an estimator of the population percentile
where the area under the density curve
and to the left of
is
. In the case that
is not an integer, then we interpolate between two order statistics. For example, if
, then we interpolate between
and
.
Example
Suppose we have a random sample of size drawn from a continuous distribution. Find estimators for the median, first quartile and second quartile. Find an estimate for the
percentile. Construct an 87% confidence interval for the
percentile.
The estimator for the median is . The estimator for the first quartile (
percentile) is third order statistic
. The estimator for the second quartile (
percentile) is the ninth order statistic
. Based on the preceding discussion, the expected area under the density curve
to the left of
are 0.25, 0.5 and 0.75, respectively.
To find the percentile, note that
. Thus we interpolate
and
. In our example, we use linear interpolation, though taking the arithmetic average of
and
is also a valid approach. The following is an estimate of the
percentile.
To find the confidence interval, consider the probability where
is the
percentile. Consider the event
as a success with probability of success
. For
to happen, there must be at least 2 successes and fewer than 7 success in the binomial distribution with
and
. Thus we have:
Thus the interval can be taken as the 87% confidence interval for
. This is an example of a distribution-free confidence interval because nothing is assumed about the underlying distribution in the construction of the confidence interval.
________________________________________________________________________
Hi,
For me, as a beginner in mathematical statistics, usefull stuff and good explanation. Only the definition of the Beta function is incorrect. Took me an hour of frustratively computing before I figured out that you just forgot a w^(-1) in the beta function 😉
Peter
Peter
I am glad you found the post useful with good explanation. Thanks for pointing out the incorrect definition of beta distribution, which has been corrected.
Dan
most of the examples on the distribution of the sample median from a U(0,1) considers an odd sample size. I tried obtaining the distribution of the sample median from a U(0,1) when n=4. I got the joint distribution of X(2) and X(3) and tried transforming it to w=X(3) and m=(X(2)+X(3))/2 using the Jacobian technique and integrating out w to get the marginal of m. However, when I tried to verify if the density of m is really a pdf, it integrates to zero. Do you have the correct procedure for this?
Pingback: The German Tank Problem | Ross's Blog
It is interesting that the distribution for order statistics is the same family as the original distribution because uniform distribution is a private case of a beta distribution. Do we know necessary and sufficient conditions for which distributions the transformation from the original distribution to the distribution of the order statistics maps into itself, meaning if you start with one family you get the same family. For example, I guess the distribution for the order statistics of the normal distribution is not a normal distribution.
Thanks for a wonderful post.
Michael
Reblogged this on Samchappelle's Blog and commented:
Great post about order statistics and their importance in non-parametric methods.
Pingback: Expected value of an order statistic - Math Help Forum
Hi, can I ask something? Is there any known distribution of average of iid beta random variables with a=theta and b=1??
this post helped me understanding the material very much. thanks a lot ! you really explained the material well for beginners like me. b
ow thanks
uniform distribution distribution order statistic