In this post, we show that the order statistics of the uniform distribution on the unit interval are distributed according to the beta distributions. This leads to a discussion on estimation of percentiles using order statistics. We also present an example of using order statistics to construct confidence intervals of population percentiles. For a discussion on the distributions of order statistics of random samples drawn from a continuous distribution, see the previous post The distributions of the order statistics.
Suppose that we have a random sample of size from a continuous distribution with common distribution function and common density function . The order statistics are obtained by ordering the sample in ascending order. In other words, is the smallest item in the sample and is the second smallest item in the sample and so on. Since this is random sampling from a continuous distribution, we assume that the probability of a tie between two order statistics is zero. In the previous post The distributions of the order statistics, we derive the probability density function of the order statistic:
The Order Statistics of the Uniform Distribution
Suppose that the random sample are drawn from . Since the distribution function of is where , the probability density function of the order statistic is:
The above density function is from the family of beta distributions. In general, the pdf of a beta distribution and its mean and variance are:
where where is the gamma function.
Then, the following shows the pdf of the order statistic of the uniform distribution on the unit interval and its mean and variance:
Estimation of Percentiles
In descriptive statistics, we define the sample percentiles using the order statistics (even though the term order statistics may not be used in a non-calculus based introductory statistics course). For example, if sample size is an odd integer , then the sample median is the order statistic . The preceding discussion on the order statistics of the uniform distribution can show us that this approach is a sound one.
Suppose we have a random sample of size from an arbitrary continuous distribution. The order statistics listed in ascending order are:
For each , consider . Since the distribution function is a non-decreasing function, the are also increasing:
It can be shown that if is a distribution function of a continuous random variable , then the transformation follows the uniform distribution . Then each and the are the order statistics of the following transformed random sample:
By the preceding discussion, . Note that is the area under the density function and to the left of . Thus is a random area and is the expected area under the density curve to the left of .
For example, suppose the sample size is an odd integer where . Then the sample median is . Note that . Thus if we choose as a point estimate for the population median, is expected to be above 50% of the population.
Furthermore, is the expected area under the density curve and between and . This expected area is:
The expected area under the density curve and above the maximum order statistic is:
Consequently we have an interesting observation about the order statistics . The order statistics divides the the area under the density curve and above the x-axis into areas. On average each of these area is .
As a result, it makes sense to use order statistics as estimator of percentiles. For example, we can use as the percentile of the sample where . Then is an estimator of the population percentile where the area under the density curve and to the left of is . In the case that is not an integer, then we interpolate between two order statistics. For example, if , then we interpolate between and .
Suppose we have a random sample of size drawn from a continuous distribution. Find estimators for the median, first quartile and second quartile. Find an estimate for the percentile. Construct an 87% confidence interval for the percentile.
The estimator for the median is . The estimator for the first quartile ( percentile) is . The estimator for the second quartile ( percentile) is . Based on the preceding discussion, the expected area under the density curve to the left of are 0.25, 0.5 and 0.75, respectively.
To find the percentile, note that . Thus we interpolate and . In our example, we use linear interpolation, though taking the arithmetic average of and is also a valid approach. The following is an estimate of the percentile.
To find the confidence interval, consider the probability where is the percentile. Consider the event as a success with probability of success . For to happen, there must be at least 2 successes and fewer than 7 success in the binomial distribution with and . Thus we have:
Thus the interval can be taken as the 87% confidence interval for . This is an example of a distribution-free confidence interval because nothing is assumed about the underlying distribution in the construction of the confidence interval.