The sign test, more examples

This is a continuation of the previous post The sign test. Examples 1 and 2 are presented in the previous post. In this post we present three more examples. Example 3 is a matched pairs problem and is an example demonstrating that the sign test may not as powerful as the t-test when the population is close to normal. Example 4 is a one-sample location problem. Example 5 is an example of an application of the sign test when the outcomes of the study or experiment are not numerical. For more information about distribution-free inferences, see [Hollander & Wolfe].

Example 3
Courses in introductory statistics are increasingly popular at community colleges across the United States. These are statistics courses that teach basic concepts of descriptive statistics, probability notions and basic inferential statistical procedures such as one and two-sample t procedures. A certain teacher of statistics at a local community college believes that taking such a course improves students’ quantitative skills. At the beginning of one semester, this professor administered a quantitative diagnostic test to a group of 15 students taking an introductory statistics course. At the end of the semester, the professor administered a second quantitative diagnostic test. The maximum possible score on each test is 50. Though the second test was at a similar level of difficulty as the first test, the questions in the second test were different and the contexts of the problems were different. Thus simply taking the first test should not improve the second test. The following matrices show the scores before and after taking the statistics course:

\displaystyle \begin{pmatrix} \text{Student}&\text{Pre-Statistics}&\text{Post-Statistics}&\text{Diff} \\{1}&17&21&4 \\{2}&26&26&0 \\{3}&16&19&3 \\{4}&28&26&-2 \\{5}&23&30&7 \\{6}&35&40&5 \\{7}&41&43&2 \\{8}&18&15&-3 \\{9}&30&29&-1 \\{10}&29&31&2 \\{11}&45&46&1 \\{12}&8&7&-1 \\{13}&38&43&5 \\{14}&31&31&0 \\{15}&36&37&1 \end{pmatrix}

Is there evidence that taking introductory statistics course at community colleges improves students’ quantitative skills? Do the analysis using the sign test.

For a given student, let X be the post-statistics score on the diagnostic test and let Y be the pre-statistics score on the disgnostic test. Let p=P[X>Y]. This is the probability that the student has an improvement on the quantitative test after taking a one-semester introductory statistics course. The test hypotheses are as follows:

\displaystyle H_0:p=\frac{1}{2} \ \ \ \ H_1:p>\frac{1}{2}

Another interpretation of the above alternative hypothesis is that the median of the post-statistics quantitative scores has moved upward. Let W be the number of students with an improvement between the post and pre scores. Since there are two students with a zero difference, under H_0, W \sim \text{binomial}(13,0.5). Then the observed value of W is w=9. The following is the P-value:

\displaystyle \text{P-value}=P[W \ge 9]=\sum \limits_{k=9}^{13} \binom{13}{k} \biggl(\frac{1}{2}\biggr)^{13}=0.1334

If we want to set the probability of a type I error at 0.10, we would not reject the null hypothesis H_0. Thus based on the sign test, it appears that merely taking an introductory statistics course may not improve a student’s quantitative skills.

The data set for the differences in scores appears symmetric and has no strong skewness and no obvious outliers. So it should be safe to use the t-test. With \mu_d being the mean of X-Y, the hypotheses for the t-test are:

\displaystyle H_0:\mu_d=0 \ \ \ \ H_1:\mu_d>0

We obtain: t-score=2.08 and the P-value=0.028. Thus with the t-test, we would reject the null hypothesis and have the opposite conclusion. Because the sign test does not use all the available information in the data, it is not as powerful as the t-test.

Example 4
Acid rain is an environmental challenge in many places around the world. It refers to rain or any other form of precipitation that is unusually acidic, i.e. rainwater having elevated levels of hydrogen ions (low pH). The measure of pH is a measure of the acidity or basicity of a solution and has a scale ranging from 0 to 14. Distilled water, with carbon dioxide removed, has a neutral pH level of 7. Liquids with a pH less than 7 are acidic. However, even unpolluted rainwater is slightly acidic with pH varying between 5.2 to 6.0 due to the fact that carbon dioxide and water in the air react together to form carbonic acid. Thus, rainwater is only considered acidic if the pH level is less than 5.2.

In a remote region in Washington state, an enviromental biologist measured the pH levels of rainwater and obtained the following data for 16 rainwater samples on 16 different dates:

\displaystyle \begin{pmatrix} 4.73&4.79&4.87&4.88 \\{5.04}&5.06&5.07&5.09 \\{5.11}&5.16&5.18&5.21 \\{5.23}&5.24&5.25&5.25 \end{pmatrix}

Is there reason to believe that the rainwater from this region is considered acidic (less than 5.2)? Use the sign test to perform the analysis.

Let X be the pH level of a sample of rainwater in this region of Washington state. Let p=P[5.2>X]=P[5.2-X>0]. Thus p is the probability of a plus sign when comparing the each data measurement and 5.2. The hypotheses to be tested are:

\displaystyle H_0:p=\frac{1}{2} \ \ \ \ H_1:p>\frac{1}{2}

The null hypothesis H_0 is equivalent to the statement that the median pH level is 5.2. If the median pH level is less than 5.2, then a data measurement will be more likely to have a plus sign. Thus the above alternative hypothesis is the statement that the median pH level is less than 5.2.

Let W be the number of plus signs (i.e. 5.2-X>0). Then W \sim \text{binomial}(16,0.5). There are 11 data measurements with plus signs (w=11). Thus the P-value is:

\displaystyle \text{P-value}=P[W \ge 11]=\sum \limits_{k=11}^{16} \binom{16}{k} \biggl(\frac{1}{2}\biggr)^{16}=0.1051

At the level of significance \alpha=0.05, the null hypothesis is not rejected. We still believe that the rainwater in this region is not acidic.

Example 5
There are two statistics instructors who are both sought after by students in a local college. Let’s call them instructor A and instructor B. The math department conducted a survey to find out who is more popular with the students. In surveying 15 students, the department found that 11 of the students prefer instructor B over instructor A. Use the sign test to test the hypothesis of no difference in popularity against the alternative hypothesis that instructor B is more popular.

More than \frac{2}{3} of the students in the sample prefer instructor B over A. This seems like convincing evidence that B is indeed more popular. Let perform some calculation to confirm this. Let W be the number of students in the sample who prefer B over A. The null hypothesis is that A and B are equally popular. The alternative hypothesis is that B is more popular. If the null hypothesis is true, then W \sim \text{binomial}(15,0.5). Then the P-value is:

\displaystyle \text{P-value}=P[W \ge 11]=\sum \limits_{k=11}^{15} \binom{15}{k} \biggl(\frac{1}{2}\biggr)^{15}=0.05923

This P-value suggests that we have strong evidence that instructor B is more popular among the students.

Myles Hollander and Douglas A. Wolfe, Non-parametric Statistical Methods, Second Edition, Wiley (1999)