How can I perform Welch’s ANOVA in Python, step-by-step?

Welch’s ANOVA is a statistical test used to compare the means of three or more groups. It is a variation of the traditional ANOVA test that takes into account unequal variances among the groups. In Python, the SciPy library provides a function called f_oneway() to perform Welch’s ANOVA. The following is a step-by-step guide on how to perform Welch’s ANOVA in Python:

Step 1: Import the necessary libraries
First, import the SciPy library using the import statement. As Welch’s ANOVA is a variation of ANOVA, we will also need to import the stats module from SciPy.

Step 2: Prepare the data
Next, we need to prepare our data in a format that can be used by the f_oneway() function. The data should be in the form of a list of arrays, where each array represents a group. If necessary, we can use the numpy library to convert our data into array format.

Step 3: Perform Welch’s ANOVA
Using the f_oneway() function, we can perform Welch’s ANOVA on our data. The function takes in the list of arrays as its input and returns the F-statistic and p-value.

Step 4: Interpret the results
The F-statistic is a measure of the difference between the means of the groups, while the p-value indicates the statistical significance of this difference. A low p-value (typically less than 0.05) indicates that there is a significant difference between the means of the groups.

Step 5: Post-hoc analysis (optional)
If the result of Welch’s ANOVA is significant, we can perform post-hoc analysis to determine which specific groups have significant differences. This can be done using the Tukey’s Honestly Significant Difference (HSD) test, which is available in the statsmodels library.

In conclusion, performing Welch’s ANOVA in Python involves importing the necessary libraries, preparing the data, using the f_oneway() function, and interpreting the results. With this step-by-step guide, one can easily perform Welch’s ANOVA in Python and gain insights into the differences between multiple groups.

Welch’s ANOVA in Python (Step-by-Step)


Welch’s ANOVA is an alternative to the typical when the is violated.

The following step-by-step example shows how to perform Welch’s ANOVA in Python.

Step 1: Create the Data

To determine if three different studying techniques lead to different exam scores, a professor randomly assigns 10 students to use each technique (Technique A, B, or C) for one week and then makes each student take an exam of equal difficulty. 

The exam scores of the 30 students are shown below:

A = [64, 66, 68, 75, 78, 94, 98, 79, 71, 80]
B = [91, 92, 93, 90, 97, 94, 82, 88, 95, 96]
C = [79, 78, 88, 94, 92, 85, 83, 85, 82, 81]

Step 2: Test for Equal Variances

Next, we can perform to determine if the variances between each group is equal.

If the of the test statistic is less than some significance level (like α = .05) then we can reject the null hypothesis and conclude that not all groups have the same variance.

We can use the following code to perform Bartlett’s test in Python:

import scipy.statsas stats

#perform Bartlett's test 
stats.bartlett(A, B, C)

BartlettResult(statistic=9.039674395, pvalue=0.010890796567)

The p-value (.01089) from Bartlett’s test is less than α = .05, which means we can reject the null hypothesis that each group has the same variance.

Thus, the assumption of equal variances is violated and we can proceed to perform Welch’s ANOVA.

Step 3: Perform Welch’s ANOVA

To perform Welch’s ANOVA in Python, we can use the welch_anova() function from the Pingouin package.

First, we need to install Pingouin:

pip install Pingouin
import pingouin as pg
import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'score': [64, 66, 68, 75, 78, 94, 98, 79, 71, 80,
                             91, 92, 93, 90, 97, 94, 82, 88, 95, 96,
                             79, 78, 88, 94, 92, 85, 83, 85, 82, 81],
                   'group': np.repeat(['a', 'b', 'c'], repeats=10)}) 

#perform Welch's ANOVA
pg.welch_anova(dv='score', between='group', data=df)

        Source	ddof1	ddof2	        F	        p-unc	        np2
0	group	2	16.651295	9.717185	0.001598	0.399286

The overall p-value (.001598) from the ANOVA table is less than α = .05, which means we can reject the null hypothesis that the exam scores are equal between the three studying techniques.

We can then perform the Games-Howell post-hoc test to determine exactly which group means are different:

pg.pairwise_gameshowell(dv='score', between='group', data=df)


        A	B	mean(A)	mean(B)	diff	se	 T	   df	   pval	
0	a	b	77.3	91.8	-14.5	3.843754 -3.772354 11.6767 0.0072
1	a	c	77.3	84.7	-7.4	3.952777 -1.872102 12.7528 0.1864
2	b	c	91.8	84.7	7.1	2.179959 3.256942  17.4419 0.0119

From the p-values we can see that the mean difference between groups a and b are significantly different and the mean difference between groups b and c are significantly different.

Additional Resources

x