How can you perform an ANCOVA in Python?

ANCOVA (Analysis of Covariance) is a statistical technique used to compare the means of two or more groups while controlling for the effect of one or more continuous variables. This technique can be performed in Python by using the statsmodels library. The first step is to import the necessary modules and load the data into a pandas dataframe. Next, the model is set up by specifying the dependent variable, independent variables, and the covariates. The model is then fitted using the OLS (ordinary least squares) function. The results can be interpreted by examining the ANCOVA table, which displays the effect of the covariate on the dependent variable, as well as the adjusted means for each group. Additionally, post-hoc tests can be performed to determine any significant differences between the groups. Overall, performing an ANCOVA in Python allows for a comprehensive analysis of the relationship between categorical and continuous variables, while controlling for the effect of covariates.

Perform an ANCOVA in Python


An ANCOVA (“analysis of covariance”) is used to determine whether or not there is a statistically significant difference between the means of three or more independent groups, after controlling for one or more covariates.

This tutorial explains how to perform an ANCOVA in Python.

Example: ANCOVA in Python

A teacher wants to know if three different studying techniques have an impact on exam scores, but she wants to account for the current grade that the student already has in the class.

She will perform an ANCOVA using the following variables:

  • Factor variable: studying technique
  • Covariate: current grade
  • Response variable: exam score

Use the following steps to perform an ANCOVA on this dataset:

Step 1: Enter the data.

First, we’ll create a pandas DataFrame to hold our data:

import numpy as np
import pandas as pd

#create data
df = pd.DataFrame({'technique': np.repeat(['A', 'B', 'C'], 5),
                   'current_grade': [67, 88, 75, 77, 85,
                                     92, 69, 77, 74, 88, 
                                     96, 91, 88, 82, 80],
                   'exam_score': [77, 89, 72, 74, 69,
                                  78, 88, 93, 94, 90,
                                  85, 81, 83, 88, 79]})
#view data 
df

   technique	current_grade	exam_score
0	   A	           67	        77
1	   A	           88	        89
2	   A	           75	        72
3	   A	           77	        74
4	   A	           85	        69
5	   B	           92	        78
6	   B	           69	        88
7	   B	           77	        93
8	   B	           74	        94
9	   B	           88	        90
10	   C	           96	        85
11	   C	           91	        81
12	   C	           88	        83
13	   C	           82	        88
14	   C	           80	        79

Step 2: Perform the ANCOVA.

Next, we’ll perform an ANCOVA using the ancova() function from the pingouin library:

pip install pingouin 
from pingouin import ancova

#perform ANCOVA
ancova(data=df, dv='exam_score', covar='current_grade', between='technique')


        Source	        SS	        DF	F	   p-unc	np2
0	technique	390.575130	2	4.80997    0.03155	0.46653
1	current_grade	4.193886	1	0.10329	   0.75393	0.00930
2	Residual	446.606114	11	NaN	   NaN	        NaN

Step 3: Interpret the results.

From the ANCOVA table we see that the p-value (p-unc = “uncorrected p-value”) for study technique is 0.03155. Since this value is less than 0.05, we can reject the null hypothesis that each of the studying techniques leads to the same average exam score, even after accounting for the student’s current grade in the class.

x