How to Perform a Two-Way ANOVA in Python?

A Two-Way ANOVA in Python can be performed by using the statsmodels.formula.api package. This package provides a class called ols which stands for ordinary least squares. The ols class allows us to use the ordinary least squares method to fit a linear model to a set of data. The two-way ANOVA is then done by using the anova_lm() method which takes the fitted linear model and performs a two-way ANOVA on the data. This method returns a table of values which can be used to analyze the results of the ANOVA.


A is used to determine whether or not there is a statistically significant difference between the means of three or more independent groups that have been split on two factors.

The purpose of a two-way ANOVA is to determine how two factors impact a response variable, and to determine whether or not there is an interaction between the two factors on the response variable.

This tutorial explains how to conduct a two-way ANOVA in Python.

Example: Two-Way ANOVA in Python

A botanist wants to know whether or not plant growth is influenced by sunlight exposure and watering frequency. She plants 30 seeds and lets them grow for two months under different conditions for sunlight exposure and watering frequency. After two months, she records the height of each plant, in inches.

Use the following steps to perform a two-way ANOVA to determine if watering frequency and sunlight exposure have a significant effect on plant growth, and to determine if there is any interaction effect between watering frequency and sunlight exposure.

Step 1: Enter the data.

First, we’ll create a pandas DataFrame that contains the following three variables:

  • water: how frequently each plant was watered: daily or weekly
  • sun: how much sunlight exposure each plant received: low, medium, or high
  • height: the height of each plant (in inches) after two months
import numpy as np
import pandas as pd

#create data
df = pd.DataFrame({'water': np.repeat(['daily', 'weekly'], 15),
                   'sun': np.tile(np.repeat(['low', 'med', 'high'], 5), 2),
                   'height': [6, 6, 6, 5, 6, 5, 5, 6, 4, 5,
                              6, 6, 7, 8, 7, 3, 4, 4, 4, 5,
                              4, 4, 4, 4, 4, 5, 6, 6, 7, 8]})

#view first ten rows of data 
df[:10]

	water	sun	height
0	daily	low	6
1	daily	low	6
2	daily	low	6
3	daily	low	5
4	daily	low	6
5	daily	med	5
6	daily	med	5
7	daily	med	6
8	daily	med	4
9	daily	med	5

Step 2: Perform the two-way ANOVA.

Next, we’ll perform the two-way ANOVA using the  from the statsmodels library:

import statsmodels.api as sm
from statsmodels.formula.api import ols

#perform two-way ANOVA
model = ols('height ~ C(water) + C(sun) + C(water):C(sun)', data=df).fit()
sm.stats.anova_lm(model, typ=2)

	           sum_sq	  df	      F	   PR(>F)
C(water)	 8.533333	 1.0	16.0000	 0.000527
C(sun)	        24.866667	 2.0	23.3125	 0.000002
C(water):C(sun)	 2.466667	 2.0	 2.3125	 0.120667
Residual	12.800000	24.0	    NaN	      NaN

Step 3: Interpret the results.

We can see the following p-values for each of the factors in the table:

  • water: p-value = .000527
  • sun: p-value = .0000002
  • water*sun: p-value = .120667

Since the p-values for water and sun are both less than .05, this means that both factors have a statistically significant effect on plant height.

Note: Although the ANOVA results tell us that watering frequency and sunlight exposure have a statistically significant effect on plant height, we would need to perform to determine exactly how different levels of water and sunlight affect plant height.

The following tutorials explain how to perform other common tasks in Python:

x