Non-parametric Tests

^a a biostatistician in the Department of Pathology at TTUHSC.

^a a pulmonary physician in the Department of Internal Medicine Texas Tech University Health Science Center in Lubbock, TX

I was working on a small study recently to compare drug metabolite concentrations in the blood between two administration regimes. However, the metabolite concentrations for a few patients were so low that they could not be detected by the instrument I was using. I would like to know more about how to analyze data from such a study.

In some studies, the instrument used cannot provide precise measurements of the outcome of interest for some of the samples. In such cases, usually, a value, for example, “undetectable” is assigned to those samples. Statistically, analyzing these data is difficult using parametric methods, such as t test, ANOVA, without making major assumptions or censoring. For example, supposing that we assign two different arbitrary values (beyond the detectable threshold) to the non-detectable observations, we might get very different results because assigning different values to the non-detectable results changes the mean and variance of the whole sample. As a simple and easy to implement alternative, a non-parametric method is usually recommended.

Non-parametric tests are also called distribution free statistics because they do not require that the data fit a known parameterized distribution such as normal. Since they require making many fewer assumptions about the data, these tests are widely used in the analysis of many types of data, such as rank data, categorical data, as well as data with “non-detectable” values.

Analog to many of the parametric tests, there are a number of commonly used non-parametric tests for specific types of comparisons.

This test is commonly used for comparing the median of two independent groups of ordinal or rank data to determine if they are significantly different. It is the non-parametric equivalent of the widely used two-sample t test.

This test extends the Mann-Whitney U test to more than 2 groups, and it is the non-parametric equivalent of the Analysis of Variance (ANOVA).

This test compares two related samples, e.g., paired/matched, or repeated measures on the same samples, to make inferences as to whether the mean ranks of the two related populations differ. It is the non-parametric equivalent of the paired two-sample t test.

This test is used to detect differences in treatments with repeated measures on the same samples. It is the non-parametric equivalent of the repeat measures ANOVA.

In the study mentioned above, the objective is to compare the drug metabolite concentrations in the blood between two administration regimes. The hypothetical data are presented below. The first row is the metabolite concentrations for patients who took the drug in capsule (n_C), and the second row is the concentrations for patients who took the drug in tablet (n_T). The total number of patients in this study is

Since one patient had non-detectable blood metabolite, the commonly used parametric test is not appropriate; we will apply a non-parametric test to this data. Note that patients who took the drug in capsules are independent (not paired/matched) of those who took the drug in tablets, thus a Mann-Whitney U test rather than a Wilcoxon Signed-Rank test should be used.

H₀: There is no difference in the ranks of metabolite concentrations between the two regimes;

H_A: There is a difference in the ranks of metabolite concentrations between the two regimes.

The null (H₀) hypothesis can be mathematically stated in two ways. The general meaning is that the probability of drawing larger values from the first population than the second population is equal to the probability of drawing larger values from the second population than the first population. A more strict expression of H₀ is that there is no significant difference between the median values for the ranked data in both populations.

To assign ranks to the data, we order the combined samples of the two administration regimes while keeping track of the two groups (Table below). In other words, the ranks are assigned to individual observations regardless which group they belong to; in the meantime, the grouping information is still kept. Note that when ties are present, we average the ranks. For example, the 4^th and 5^th ordered values are both 0.31, thus we assign the averaged rank of 4.5 to both of them.

* The 4^th and 5^th ordered values are both 0.31, the mean rank of 4.5 was assigned to both of them.

The next step is to calculate the U statistic. The distribution of U under the null hypothesis is known. Tables of this distribution for small samples are available. For samples larger than 20, the distribution is approximated to be normal. The calculation can be done manually or using a formula.

To manually determine U, pick the sample that seems to have the smaller values. The final result is independent of which group is chosen, but one group requires less effort. For our example, pick the Tablet data. For each Tablet data value, count how many Capsule data values are less than the Tablet data value. Add all these counts together. For our example, Non-detectable has 0 Capsule data values less than it, 0.05 has 0 Capsule data values less than it, 0.11 has 0 Capsule data values less than it, 0.31 has 0 Capsule data values less than it and 1 tie, and 0.53 has 2 Capsule data values less than it. Ties are scored as 0.5. For our example:

If the Capsule data is used as the reference, one gets a different, but predetermined, result:

The sum (U_T+U_C) must equal the number of possible was to compare (n_T) things against (n_C)things:

The above algorithm can be automated by calculating the sum of the ranks for both the capsule and tablet groups separately. For the hypothetical data, the rank sums of the Capsule and Tablet groups are R_T = 27.5 (4.5+6+8+9) and R_C = 17.5 (1+2+3+4.5+7), respectively. Note that it is always a good practice to check whether the total sum of ranks (both groups included) equals to (N(N+1)/2 to make sure that all the ranks are calculated correctly. In our calculation, we have N = 9 and thus(N(N+1)/2 , which does equal to 27.5+17.5.

The next step is to calculate the U value, which is the statistic used for making the inference. U is the minimum of U_T and U_C, which are calculated below for the Capsule and Tablet groups respectively. We let,

Note that in the formulas, the first term is the total number of comparison possibilities, the second term is total sum of the rank sums for both groups, and the R term is the rank sum for the chosen group.

The U value is converted to a significance or p value using the known distribution of U under the null hypothesis. For large samples, a normal approximation can be used:

where m_U = n_Tn_C/2 (the median value for U corresponding to a null assumption),

Note that J is the number of groups of ties, and t_j is the number of tied ranks in group j. Also if there are no ties in the data, then the formula reduces to

The value z is the difference bbetween the observed comparisons vs. the median value (50% greater and 50% less) normalized to the standard deviation of the U statistic for the data. Tables (and computations) of p values from z values are readily available.

Since Z follows a standard normal distribution, the probability of observing a value equal to or more extreme than the observed, given the null hypothesis is true, is

for a two sided test. In this example, the p value is

Since the p value is greater than 0.05, we do not reject H₀, and conclude that there is not sufficient evidence that the ranks of metabolite concentration differ between the two regimes.

It may, at first glance, seem inappropriate to apply the mathematics of normal distributions to data that are known to not be normally distributed. This is the beauty of using rank methods to analyze the data. Any data point can be greater than, less than, or equal to the independent data point that it is being compared with. There are no other possibilities. Under the null hypothesis, the probability of a given data point having a greater value than the point it is being compared with must be equal to the probability of having a lesser value. The comparison is reduced to a coin flip, so the accumulated comparisons behave exactly as a random walk which does follow a normal distribution for large N.

Since U has a discrete distribution (U is derived from ranks, thus it can take only certain values) and Z follows a normal distribution, which is continuous (can take any value between -

and +

), very often, an adjustment of continuity is performed to correct for the probability of using a continuous distribution to approximate a discrete distribution. In other words, the cumulative probability of a discrete random variable has jumps. To use a continuous distribution to approximate it, a correction is recommended to spread the probability uniformly over an interval, especially when the sample size is small. In this case, the z value after applying continuity correction is,

A number of statistical software can be used to perform a Mann-Whitney U test. For example, the R code for the above Mann-Whitney test is:

Wilcox.test (Tablet, Capsule, correct=TRUE) if continuity is to be adjusted; or Wilcox.test (Tablet, Capsule, correct=FALSE) if continuity is not to be adjusted.

Note that the non-detectable observation was assigned a value 0.005, which is equal to the half of the lower detectable threshold*. In fact, assigning any value less than 0.01 would be acceptable since non-parametric test uses the rank of the data to make inferences, thus as long as the assigned value is less than the threshold, the result will be the same. On contrast, assigning different values to the non-detectable observations when using a parametric test can sometimes results in very different results.

proc npar1way data=data Wilcoxon correct=yes; *(use correct=no if continuity is not to be adjusted)

Wilcoxon Two-Sample Test
Normal Approximation
Z	1.7218
One-Sided Pr > Z	0.0425
Two-Sided Pr > \|Z\|	0.0851
Z includes a continuity correction of 0.5.

In summary, a non-parametric test is a very useful tool for analyzing your data when the sample size is comparatively small and the distribution of the outcome is unknown and cannot be assumed to be approximately normal.

References

...................................................................................................................................................................................................................................................................................................................................

Capsule	0.59	0.31	1.22	0.52
Tablet	0.11	Non-detectable*	0.31	0.05	0.53

Observations	Capsule				0.31	0.52		0.59	1.22
Observations	Tablet	Non-detectable	0.05	0.11	0.31		0.53

Ranks	Capsule				4.5*	6		8	9
Ranks	Tablet	1	2	3	4.5*		7

PDF

Non-parametric Tests

References