Sample statistics such as sample median, sample quartiles and sample minimum and maximum play a prominent role in the analysis using empirical data (e.g. in descriptive statistics and exploratory data analysis (EDA)). In this post we discuss order statistics and their distributions. The order statistics are the items from a randon sample arranged in increasing order. The focus here is to present the distribution functions and probability density functions of order statistics. The order statistics are important tools in non-parametric statistical inferences. In subsequent posts, we will present examples of applications in non-parametric methods.
In this post, we only consider random samples obtained from a continuous distribution (i.e. the distribution function is a continuous function). Let be a random sample of size from a continuous distribution with distribution function . We order the random sample in increasing order and obtain . In other words, we have:
the smallest of
the second smallest of
the largest of
We set and . The order statistic is called the order statistic. Since we are working with a continuous distribution, we assume that the probability of two sample items being equal is zero. Thus we can assume that . That is, the probability of a tie is zero among the order statistics.
The Distribution Functions of the Order Statistics
The distribution function of is an upper tail of a binomial distribution. If the event occurs, then there are at least many in the sample that are less than or equal to . Consider the event that as a success and as the probability of success. Then the drawing of each sample item becomes a Bernoulli trial (a success or a failure). We are interested in the probability of having at least many successes. Thus the following is the distribution function of :
The following relationship is used in deriving the probability density function:
The Probability Density Functions of the Order Statistics
The probability density function of is given by:
We prove this by induction. Consider . Note that is the probability that at least one and is the complement of the probability of having no . Thus . By taking derivative, we have:
Suppose we derive the pdf of using (3) and obtain the following:
Now we take the derivative of (2) above and we have:
After simplifying the right hand side, we obtain the pdf of as in (3).
We would like to make two comments. One is that in terms of problem solving, it may be better to rely on the distribution function in (1) above to derive the pdf. The thought process behind (1) is clear. The second is that the last three terms in the pdf in (3) are very instructive. Let’s arrange these three terms as follows:.
Note that the first term is the probability that there are sample items below . The middle term indicates that one sample item is right around . The third term indicates that there are items above . Thus the following multinomial probability is the pdf in (3):
This heuristic approach is further described here.
Suppose that a sample of size is drawn from the uniform distribution on the interval . Find the pdfs for , and . Find .
Let . The distribution function and pdf of are:
Using (3), the following are the pdfs of , and .
In this example, is the sample median and serves as a point estimate for the population median . As an estimator of the median, we prefer not to overestimate or underestimate (we call such estimator as unbiased estimator). In this particular example, the sample median is an unbiased estimator of . To see this we show .
By substituting , we have the following beta integral.
Practice problems are found here in a companion blog.
Revised April 6, 2015.