How to perform a Kruskal-Wallis test in SAS

How to perform a Kruskal-Wallis test in SAS

To effectively perform a Kruskal-Wallis test in the powerful statistical software environment, one must first ensure that the data is meticulously structured and ready for analysis within the SAS system. The analysis itself is executed using the specialized statistical procedure, specifically the “PROC NPAR1WAY” command. This procedure is specifically designed for non-parametric one-way analysis. It is imperative that the command syntax clearly specifies the classification variable and the response variable, along with the required option—frequently denoted as “KW” or implied via the default outputs when using `PROC NPAR1WAY`—to invoke the Kruskal-Wallis test statistic. Upon successful execution, the output window in SAS will deliver comprehensive results, including the calculated test statistic, which is based on the chi-square statistic approximation, the critical p-value, and detailed tables for any requested post-hoc analysis, which are vital for pinpointing exactly where significant differences lie among the groups.


Introduction to the Kruskal-Wallis Test

The Kruskal-Wallis test (sometimes referred to as the Kruskal-Wallis H test) is a highly valuable statistical tool employed when researchers need to assess whether there is a statistically significant difference among the population medians of three or more independent groups. This test is foundational in scenarios where the assumptions required for standard parametric procedures are not met.

It is widely recognized as the non-parametric equivalent of the traditional One-Way Analysis of Variance (ANOVA). While ANOVA relies on the assumption that the data is normally distributed and that group variances are homogeneous, the Kruskal-Wallis test is distribution-free. This robustness makes it the method of choice when dealing with ordinal data or when the continuous data severely violates the parametric assumptions, such as those related to normality or outliers. The fundamental principle involves ranking all observations across all groups combined and then analyzing the sum of ranks within each group.

Understanding when to use this specific test is crucial for rigorous statistical practice. If your study design involves a single independent factor with three or more levels (groups) and a continuous or ordinal dependent variable, but you cannot confidently rely on the assumption of underlying normal distributions, then the Kruskal-Wallis test provides a powerful and reliable alternative. This tutorial will guide you through a comprehensive, step-by-step example of how to conduct this essential non-parametric analysis efficiently using SAS software.

Understanding the `PROC NPAR1WAY` Statement in SAS

In the SAS programming environment, non-parametric analyses involving comparisons of multiple groups are handled by the `PROC NPAR1WAY` procedure. This procedure is exceptionally flexible, allowing users to specify several different non-parametric tests, including the Wilcoxon Rank Sum test (for two groups) and the Kruskal-Wallis test (for three or more groups).

The core syntax of `PROC NPAR1WAY` requires careful specification of both the classification variable (the grouping variable) and the response variable (the variable being measured). Furthermore, to obtain comprehensive results, including necessary post-hoc tests when the overall test is significant, specific options must be included in the procedure statement. While the Kruskal-Wallis H test is often the default output when using `PROC NPAR1WAY` with a classification variable containing three or more levels, explicitly requesting post-hoc comparisons is essential for detailed interpretation.

For high-quality analysis, we recommend adding options like WILCOXON and DSCF. The WILCOXON option ensures the output includes the Wilcoxon rank sums, which are utilized in calculating the overall Kruskal-Wallis H statistic. Crucially, the DSCF option requests the Dwass, Steel, Critchlow-Fligner multiple comparison procedure. This is a robust and highly recommended method for performing pairwise comparisons following a significant global Kruskal-Wallis result, providing adjusted p-values to control the family-wise error rate.

Case Study Setup: Fertilizer and Plant Growth

To illustrate the application of the Kruskal-Wallis test, we will utilize a practical research scenario focused on agricultural science. Suppose a team of researchers is investigating whether three distinct types of fertilizer—labeled Fert1, Fert2, and Fert3—yield statistically different levels of plant growth. Since preliminary data checks suggest the growth measurements may not follow a standard normal distribution, or the sample sizes are small, they opt for the non-parametric Kruskal-Wallis test.

The study design involves randomly selecting a total of 30 young plants. These plants are then randomly divided into three equal groups, each containing 10 plants. Each group receives one of the three fertilizers. To ensure experimental rigor, all other confounding factors, such as soil type, light exposure, and watering schedule, are meticulously controlled across all groups. After a standardized treatment period of one month, the researchers carefully measure the total height increase (growth, measured in inches) for every individual plant.

The objective is to determine whether the difference observed in the population median plant growth among these three fertilizer groups is significant enough to reject the null hypothesis. The null hypothesis in this context states that the population medians of plant growth for all three fertilizer groups are identical (i.e., Fertilizer type has no effect on growth).

Step 1: Preparing and Entering Data in SAS

The initial and critical step in any SAS analysis is the creation of a structured data set. Our data set, which we name fertilizer_data, must contain two primary variables: a categorical variable identifying the fertilizer group (fertilizer) and a continuous variable representing the measured plant growth (growth). The fertilizer variable must be defined as a character variable, indicated by the dollar sign ($) in the `input` statement, as it holds categorical labels (fert1, fert2, fert3).

The following SAS code block demonstrates the necessary steps to create this dataset. We use the `DATA` step to name the dataset, the `INPUT` statement to define the variables and their types, and the `DATALINES` statement to input the actual observations, pairing the fertilizer type with the resulting plant growth measurement. Each pair represents one plant’s observation.

Review the data structure carefully. We have 10 observations for each of the three fertilizer types, totaling 30 rows of data. This ensures we have a balanced design, although the Kruskal-Wallis test does not strictly require equal sample sizes. The code below illustrates the exact syntax required:

/*create dataset: fertilizer_data*/
data fertilizer_data;
    input fertilizer $ growth;
    datalines;
fert1 7
fert1 14
fert1 14
fert1 13
fert1 12
fert1 9
fert1 6
fert1 14
fert1 12
fert1 8
fert2 15
fert2 17
fert2 13
fert2 15
fert2 15
fert2 13
fert2 9
fert2 12
fert2 10
fert2 8
fert3 6
fert3 8
fert3 8
fert3 9
fert3 5
fert3 14
fert3 13
fert3 8
fert3 10
fert3 9
;
run;

Step 2: Executing the Kruskal-Wallis Test in SAS

Once the data set is successfully created and stored in the SAS environment, the next step involves calling the `PROC NPAR1WAY` procedure to execute the analysis. As detailed previously, we must utilize specific options within this procedure to ensure we perform the Kruskal-Wallis test and obtain meaningful post-hoc results.

The code requires three main elements within the `PROC NPAR1WAY` block: the `DATA` statement, which specifies the input dataset; the `CLASS` statement, which identifies the grouping variable (fertilizer); and the `VAR` statement, which specifies the dependent variable being compared (growth). Additionally, we include the WILCOXON option for rank statistics and the DSCF option for Dwass, Steel, Critchlow-Fligner multiple comparisons, which are essential for thorough interpretation.

The purpose of this execution is to test the global null hypothesis that the distribution of growth is the same across all three fertilizer groups. If this global test is significant, we then proceed to interpret the pairwise comparisons generated by the DSCF option. The following SAS code block executes the full Kruskal-Wallis analysis:

/*perform Kruskal-Wallis test and post-hoc comparisons*/
proc npar1way data=fertilizer_data wilcoxon dscf;
    class fertilizer;
    var growth;
run;

Step 3: Interpreting the Primary Kruskal-Wallis Results

The output generated by SAS provides multiple tables. We first focus on the table summarizing the results of the global Kruskal-Wallis test. This table presents the overall test statistic, which approximates a chi-square statistic, along with the corresponding degrees of freedom and, most importantly, the p-value.

The image below illustrates the structure of the primary output table, which summarizes the test statistic:

For our specific analysis, the calculated p-value of the test is reported as 0.0431. To make a statistical decision, we compare this value against a predetermined significance level ($alpha$), typically set at 0.05. Since 0.0431 is less than 0.05, we fulfill the criterion for statistical significance. This leads us to the decision to reject the null hypothesis.

Rejecting the null hypothesis means that we have found sufficient statistical evidence to conclude that the population medians of plant growth are not equal across all three fertilizer groups. In plain terms, the type of fertilizer used leads to a statistically significant difference in plant growth. However, this global test does not tell us which specific pairs of fertilizers differ from one another; for that, we must examine the post-hoc tests.

Step 4: Analyzing Pairwise Comparisons (Post-Hoc Analysis)

Because the overall Kruskal-Wallis test was statistically significant (p < 0.05), we are justified in performing subsequent pairwise comparisons to identify the specific group differences. This is achieved through the output generated by the DSCF option (Dwass, Steel, Critchlow-Fligner method) requested in our `PROC NPAR1WAY` statement. This procedure calculates adjusted p-values for every possible pair of groups, which is essential for maintaining accuracy when performing multiple tests.

The subsequent table in the SAS output provides these pairwise comparison results:

 

We analyze this table by comparing the adjusted p-value for each pair against our significance level of 0.05. The table lists three comparisons: Fert1 vs. Fert2, Fert1 vs. Fert3, and Fert2 vs. Fert3. We observe that the comparison between fertilizer 2 and fertilizer 3 yields a p-value of 0.0390. Since 0.0390 is less than 0.05, we conclude that there is a statistically significant difference in the plant growth distribution between the plants treated with Fertilizer 2 and those treated with Fertilizer 3.

Conversely, the comparisons between Fertilizer 1 and Fertilizer 2 (p-value > 0.05) and Fertilizer 1 and Fertilizer 3 (p-value > 0.05) do not show statistical significance at the 0.05 level. Therefore, the significant difference in plant growth identified by the overall Kruskal-Wallis test is attributable solely to the difference between fertilizer 2 and fertilizer 3. This conclusion allows the researchers to make actionable recommendations regarding which fertilizers perform differently.

Summary of Statistical Findings and Next Steps

In summary, our complete non-parametric analysis utilizing the Kruskal-Wallis H test in SAS revealed a statistically significant difference in median plant growth attributed to the type of fertilizer used (H(2) = 6.30, p = 0.0431). The subsequent Dwass, Steel, Critchlow-Fligner post-hoc tests confirmed that this global significance stemmed specifically from the comparison between Fertilizer 2 and Fertilizer 3.

These findings suggest that Fertilizer 2 and Fertilizer 3 have fundamentally different effects on plant growth, while Fertilizer 1’s effect is not statistically distinct from either of the other two at the 0.05 level. Researchers should now focus on the inherent differences between Fertilizer 2 and Fertilizer 3 composition or mechanism of action to better understand the biological reasons for the observed variation.

The methodological process demonstrated here—from data setup using the `DATA` step, to analysis via `PROC NPAR1WAY` with appropriate options (WILCOXON, DSCF), and finally to interpreting the chi-square statistic and adjusted p-values—serves as a robust framework for conducting non-parametric comparisons across multiple independent groups in SAS.

The following tutorials explain how to perform other common statistical tests in SAS:

Cite this article

stats writer (2025). How to perform a Kruskal-Wallis test in SAS. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-to-perform-a-kruskal-wallis-test-in-sas/

stats writer. "How to perform a Kruskal-Wallis test in SAS." PSYCHOLOGICAL SCALES, 29 Nov. 2025, https://scales.arabpsychology.com/stats/how-to-perform-a-kruskal-wallis-test-in-sas/.

stats writer. "How to perform a Kruskal-Wallis test in SAS." PSYCHOLOGICAL SCALES, 2025. https://scales.arabpsychology.com/stats/how-to-perform-a-kruskal-wallis-test-in-sas/.

stats writer (2025) 'How to perform a Kruskal-Wallis test in SAS', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-to-perform-a-kruskal-wallis-test-in-sas/.

[1] stats writer, "How to perform a Kruskal-Wallis test in SAS," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, November, 2025.

stats writer. How to perform a Kruskal-Wallis test in SAS. PSYCHOLOGICAL SCALES. 2025;vol(issue):pages.

Download Post (.PDF)
Slide Up
x
PDF
Scroll to Top