The sign test

What kind of significance tests do we use for doing inference on the mean of an obviously non-normal population? If the sample is large, we can still use the t-test since the sampling distribution of the sample mean $\overline{X}$ is close to normal and the t-procedure is robust. If the sample size is small and the underlying distribution is clearly not normal (e.g. is extremely skewed), what significance test do we use? Let’s take the example of a matched pairs data problem. The matched pairs t-test is to test the hypothesis that there is “no difference” between two continuous random variables $X$ and $Y$ that are paired. If the underlying distributions are normal or if the sample size is large, the matched pairs t-test are an excellent test. However, absent normality or large samples, the sign test is an alternative to the matched pairs t-test. In this post, we discuss how the sign test works and present some examples. Examples 1 and 2 are shown in this post. Examples 3, 4 and 5 are shown in the next post
The sign test, more examples.

The sign test and the confidence intervals for percentiles (discussed in the previous post Confidence intervals for percentiles) are examples of distribution-free inference procedures. They are called distribution-free because no assumptions are made about the underlying distributions of the data measurements. For more information about distribution-free inferences, see [Hollander & Wolfe].

We discuss two types of problems for which the sign test is applicable – one-sample location problems and matched pairs data problems. In the one-sample problems, the sign test is to test whether the location (median) of the data has shifted. In the matched pairs problems, the sign test is to test whether the location (median) of one variable has shifted in relation to the matched variable. Thus, the test hypotheses must be restated in terms of the median if the sign test is to be used as an alternative to the t-test. With the sign test, the question is “has the median changed?” whereas the question is “has the mean changed?” for the t-test.

The sign test is one of the simplest distribution-free procedures. It is an excellent choice for a significance test when the sample size is small and the data are highly skewed or have outliers. In such cases, the sign test is preferred over the t-test. However, the sign test is generally less powerful than the t-test. For the matched pairs problems, the sign test only looks at the signs of the differences of the data pairs. The magnitude of the differences is not taken into account. Because the sign test does not use all the available information contained in the data, it is less powerful than the t-test when the population is close to normal.

How the sign test works
Suppose that $(X,Y)$ is a pair of continuous random variables. Suppose that a random sample of paired data $(X_1,Y_1),(X_2,Y_2), \cdots, (X_n,Y_n)$ is obtained. We omit the observations $(X_i,Y_i)$ with $X_i=Y_i$. Let $m$ be the number of pairs for which $X_i \ne Y_i$. For each of these $m$ pairs, we make a note of the sign of the difference $X_i-Y_i$ ($+$ if $X_i>Y_i$ and $-$ if $X_i). Let $W$ be the number of $+$ signs out of these $m$ pairs. The sign test gets its name from the fact that the statistic $W$ is the test statistic of the sign test. Thus we are only considering the signs of the differences in the paired data and not the magnitude of the differences. The sign test is also called the binomial test since the statistic $W$ has a binomial distribution.

Let $p=P[X>Y]$. Note that this is the probability that a data pair $(X,Y)$ has a $+$ sign. If $p=\frac{1}{2}$, then any random pair $(X,Y)$ has an equal chance of being a $+$ or a $-$ sign. The null hypothesis $H_0:p=\frac{1}{2}$ is the hypothesis of “no difference”. Under this hypothesis, there is no difference between the two measurements $X$ and $Y$. The sign test is test the null hypothesis $H_0:p=\frac{1}{2}$ against any one of the following alternative hypotheses:

$\displaystyle H_1:p<\frac{1}{2} \ \ \ \ \ \text{(Left-tailed)}$
$\displaystyle H_1:p>\frac{1}{2} \ \ \ \ \ \text{(Right-tailed)}$
$\displaystyle H_1:p \ne \frac{1}{2} \ \ \ \ \ \text{(Two-tailed)}$

The statistic $W$ can be considered a series of $m$ independent trials, each of which has probability of success $p=P[X>Y]$. Thus $W \sim binomial(m,p)$. When $H_0$ is true, $W \sim binomial(m,\frac{1}{2})$. Thus the binomial distribution is used for calculating significance. The left-tailed P-value is of the form $P[W \le w]$ and the right-tailed P-value is $P[W \ge w]$. Then the two-tailed P-value is twice the one-sided P-value.

The sign test can also be viewed as testing the hypothesis that the median of the differences is zero. Let $m_d$ be the median of the differences $X-Y$. The null hypothesis $H_0:p=\frac{1}{2}$ is equivalent to the hypothesis $H_0:m_d=0$. For the alternative hypotheses, we have the following equivalences:

$\displaystyle H_1:p<\frac{1}{2} \ \ \ \equiv \ \ \ H_1:m_d<0$

$\displaystyle H_1:p>\frac{1}{2} \ \ \ \equiv \ \ \ H_1:m_d>0$

$\displaystyle H_1:p \ne \frac{1}{2} \ \ \ \equiv \ \ \ H_1:m_d \ne 0$

Example 1
A running club conducts a 6-week training program in preparing 20 middle aged amateur runners for a 5K running race. The following matrix shows the running times (in minutes) before and after the training program. Note that five kilometers = 3.1 miles.

$\displaystyle \begin{pmatrix} \text{Runner}&\text{Pre-training}&\text{Post-training}&\text{Diff} \\{1}&57.5&54.9&2.6 \\{2}&52.4&53.5&-1.1 \\{3}&59.2&49.0&10.2 \\{4}&27.0&24.5&2.5 \\{5}&55.8&50.7&5.1 \\{6}&60.8&57.5&3.3 \\{7}&40.6&37.2&3.4 \\{8}&47.3&42.3&5.0 \\{9}&43.9&47.3&-3.4 \\{10}&43.7&34.8&8.9 \\{11}&60.8&53.3&7.5 \\{12}&43.9&33.8&10.1 \\{13}&45.6&41.7&3.9 \\{14}&40.6&41.5&-0.9 \\{15}&54.1&52.5&1.6 \\{16}&50.7&52.4&-1.7 \\{17}&25.4&25.9&-0.5 \\{18}&57.5&54.7&2.8 \\{19}&43.9&38.7&5.2 \\{20}&43.9&39.9&4.0 \end{pmatrix}$

The difference is taken to be pre-training time minus post-training time. Use the sign test to test whether the training program improves run time.

For a given runner, let $X$ be a random pre-training running time and $Y$ be a random post-training running time. The hypotheses to be tested are:

$\displaystyle H_0:p=\frac{1}{2} \ \ \ \ \ H_1:p>\frac{1}{2} \ \ \ \text{where} \ p=P[X>Y]$

Under the null hypothesis $H_0$, there is no difference between the pre-training run time and post-training run time. The difference is equally likely to be a plus sign or a minus sign. Let $W$ be the number of runners in the sample for which $X_i-Y_i>0$. Then $W \sim \text{Binomial}(20,0.5)$. The observed value of the statistic $W$ is $w=15$. Since this is a right-tailed test, the following is the P-value:

$\displaystyle \text{P-value}=P[W \ge 15]=\sum \limits_{k=15}^{20} \binom{20}{k} \biggl(\frac{1}{2}\biggr)^{20}=0.02069$

Because of the small P-value, the result of 15 out of 20 runners having improved run time cannot be due to random chance alone. So we reject $H_0$ and we have good reason to believe that the training program reduces run time.

Example 2
A car owner is curoius about the effect of oil changes on gas mileage. For each of 17 oil changes, he recorded data for miles per gallon (MPG) prior to the oil change and after the oil change. The following matrix shows the data:

$\displaystyle \begin{pmatrix} \text{Oil Change}&\text{MPG (Pre)}&\text{MPG (Post)}&\text{Diff} \\{1}&24.24&27.45&3.21 \\{2}&24.33&24.60&0.27 \\{3}&24.45&28.27&3.82 \\{4}&23.37&22.49&-0.88 \\{5}&26.73&28.67&1.94 \\{6}&30.40&27.51&-2.89 \\{7}&29.57&29.28&-0.29 \\{8}&22.27&23.18&0.91 \\{9}&27.00&27.64&0.64 \\{10}&24.95&26.01&1.06 \\{11}&27.12&27.39&0.27 \\{12}&28.53&28.67&0.14 \\{13}&27.55&30.27&2.72 \\{14}&30.17&27.83&-2.34 \\{15}&26.00&27.78&1.78 \\{16}&27.52&29.18&1.66 \\{17}&34.61&33.04&-1.57\end{pmatrix}$

Regular oil changes are obviously crucial to maintaining the overall health of the car. It seems to make sense that oil changes would improve gas mileage. Is there evidence that this is the case? Do the analysis using the sign test.

In this example we set the hypotheses in terms of the median. For a given oil change, let $X$ be the post oil change MPG and $Y$ be the pre oil change MPG. Consider the differences $X-Y$. Let $m_d$ be the median of the differences $X-Y$. We test the null hypothesis that there is no change in MPG before and after oil change against the alternative hypothesis that the median of the post oil change MPG has shifted to the right in relation to the pre oil change MPG. We have the following hypotheses:

$\displaystyle H_0:m_d=0 \ \ \ \ \ H_1:m_d>0$

Let $W$ be the number of oil changes with positive differences in MPG (post minus pre). Then $W \sim \text{Binomial}(17,0.5)$. The observed value of the statistic $W$ is $w=12$. Since this is a right-tailed test, the following is the P-value:

$\displaystyle \text{P-value}=P[W \ge 12]=\sum \limits_{k=12}^{17} \binom{17}{k} \biggl(\frac{1}{2}\biggr)^{17}=0.07173$

At the significance level of $\alpha=0.10$, we reject the null hypothesis. However, we would like to add a caveat. The value of this example is that it is an excellent demonstration of the sign test. The 17 oil changes are not controlled. For example, the data are just records of mileage and gas usage for 17 oil changes (both pre and post). No effort was made to make sure that the driving conditions are similar for the pre oil change MPG and post oil change MPG (freeway vs. local streets, weather conditions, etc). With more care in producing the data, we can conceivably derive a more definite answer.

Reference
Myles Hollander and Douglas A. Wolfe, Non-parametric Statistical Methods, Second Edition, Wiley (1999)

8 thoughts on “The sign test”

1. 3. A group of 11 students selected at random secured the grade points: 1.5, 2.2, 0.9, 1.3, 2.0, 1.6, 1.8, 1.5, 2.0, 1.2 and 1.7 (out of 3). Use the sign test to test the hypothesis that intelligence is a random function (with a median of 1.8) at 5% level of significance.

2. A group of 11 students selected at random secured the grade points: 1.5, 2.2, 0.9, 1.3, 2.0, 1.6, 1.8, 1.5, 2.0, 1.2 and 1.7 (out of 3). Use the sign test to test the hypothesis that intelligence is a random function (with a median of 1.8) at 5% level of significance.