How to Perform Welch’s ANOVA in Python (Step-by-Step)

Welch’s ANOVA is a type of one-way analysis of variance (ANOVA) test that can be used when the variances of two or more groups are unequal. To perform Welch’s ANOVA in Python, you must first import the statsmodels package, calculate the means and standard deviations of the groups, compute the Welch ANOVA statistic, and use the statsmodels f_oneway function to test the null hypothesis of equal means. Finally, you can interpret the test statistic and p-value to determine if the groups have different means.


Welch’s ANOVA is an alternative to the typical when the is violated.

The following step-by-step example shows how to perform Welch’s ANOVA in Python.

Step 1: Create the Data

To determine if three different studying techniques lead to different exam scores, a professor randomly assigns 10 students to use each technique (Technique A, B, or C) for one week and then makes each student take an exam of equal difficulty. 

The exam scores of the 30 students are shown below:

A = [64, 66, 68, 75, 78, 94, 98, 79, 71, 80]
B = [91, 92, 93, 90, 97, 94, 82, 88, 95, 96]
C = [79, 78, 88, 94, 92, 85, 83, 85, 82, 81]

Step 2: Test for Equal Variances

Next, we can perform to determine if the variances between each group is equal.

If the of the test statistic is less than some significance level (like α = .05) then we can reject the null hypothesis and conclude that not all groups have the same variance.

We can use the following code to perform Bartlett’s test in Python:

import scipy.stats as stats

#perform Bartlett's test 
stats.bartlett(A, B, C)

BartlettResult(statistic=9.039674395, pvalue=0.010890796567)

The p-value (.01089) from Bartlett’s test is less than α = .05, which means we can reject the null hypothesis that each group has the same variance.

Thus, the assumption of equal variances is violated and we can proceed to perform Welch’s ANOVA.

Step 3: Perform Welch’s ANOVA

To perform Welch’s ANOVA in Python, we can use the welch_anova() function from the Pingouin package.

First, we need to install Pingouin:

pip install Pingouin

import pingouin as pg
import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'score': [64, 66, 68, 75, 78, 94, 98, 79, 71, 80,
                             91, 92, 93, 90, 97, 94, 82, 88, 95, 96,
                             79, 78, 88, 94, 92, 85, 83, 85, 82, 81],
                   'group': np.repeat(['a', 'b', 'c'], repeats=10)}) 

#perform Welch's ANOVA
pg.welch_anova(dv='score', between='group', data=df)

        Source	ddof1	ddof2	        F	        p-unc	        np2
0	group	2	16.651295	9.717185	0.001598	0.399286

The overall p-value (.001598) from the ANOVA table is less than α = .05, which means we can reject the null hypothesis that the exam scores are equal between the three studying techniques.

We can then perform the Games-Howell post-hoc test to determine exactly which group means are different:

pg.pairwise_gameshowell(dv='score', between='group', data=df)


        A	B	mean(A)	mean(B)	diff	se	 T	   df	   pval	
0	a	b	77.3	91.8	-14.5	3.843754 -3.772354 11.6767 0.0072
1	a	c	77.3	84.7	-7.4	3.952777 -1.872102 12.7528 0.1864
2	b	c	91.8	84.7	7.1	2.179959 3.256942  17.4419 0.0119

From the p-values we can see that the mean difference between groups a and b are significantly different and the mean difference between groups b and c are significantly different.

x