How to Perform White’s Test in Python (Step-by-Step)

White’s test can be performed in Python using the statsmodels library. The steps for this test are as follows: first, create a linear regression model using the OLS method; second, fit the model to the data and obtain the residuals; third, calculate the variance of the residuals; fourth, calculate the standard error of the variance; fifth, calculate the F-statistic; and finally, compare the F-statistic to the critical value in the F-table to determine if the model is satisfactory.


White’s test is used to determine if is present in a regression model.

Heteroscedasticity refers to the unequal scatter of at different levels of a , which violates the that the residuals are equally scattered at each level of the response variable.

The following step-by-step example shows how to perform White’s test in Python to determine whether or not heteroscedasticity is a problem in a given regression model.

Step 1: Load Data

In this example we will fit a using the mtcars dataset.

The following code shows how to load this dataset into a pandas DataFrame:

from sklearn.linear_model import LinearRegression
from statsmodels.stats.diagnostic import het_white
import statsmodels.api as sm
import pandas as pd

#define URL where dataset is located
url = "https://raw.githubusercontent.com/arabpsychology/Python-Guides/main/mtcars.csv"

#read in data
data = pd.read_csv(url)

#view summary of data
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 12 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   model   32 non-null     object 
 1   mpg     32 non-null     float64
 2   cyl     32 non-null     int64  
 3   disp    32 non-null     float64
 4   hp      32 non-null     int64  
 5   drat    32 non-null     float64
 6   wt      32 non-null     float64
 7   qsec    32 non-null     float64
 8   vs      32 non-null     int64  
 9   am      32 non-null     int64  
 10  gear    32 non-null     int64  
 11  carb    32 non-null     int64  
dtypes: float64(5), int64(6), object(1)

Step 2: Fit Regression Model

Next, we will fit a regression model using mpg as the response variable and disp  and hp as the two predictor variables:

#define response variable
y = data['mpg']

#define predictor variables
x = data[['disp', 'hp']]

#add constant to predictor variables
x = sm.add_constant(x)

#fit regression model
model = sm.OLS(y, x).fit()

Step 3: Perform White’s Test

Next, we will use the function from the statsmodels package to perform White’s test to determine if heteroscedasticity is present in the regression model:

#perform White's test
white_test = het_white(model.resid,  model.model.exog)

#define labels to use for output of White's test
labels = ['Test Statistic', 'Test Statistic p-value', 'F-Statistic', 'F-Test p-value']

#print results of White's test
print(dict(zip(labels, white_test)))

{'Test Statistic': 7.076620330416624, 'Test Statistic p-value': 0.21500404394263936,
 'F-Statistic': 1.4764621093131864, 'F-Test p-value': 0.23147065943879694}

Here is how to interpret the output:

  • The test statistic is X2 = 7.0766.
  • The corresponding p-value is 0.215.

White’s test uses the following null and alternative hypotheses:

  • Null (H0): Homoscedasticity is present (residuals are equally scattered)
  • Alternative (HA): Heteroscedasticity is present (residuals are not equally scattered)

This means we do not have sufficient evidence to say that heteroscedasticity is present in the regression model.

What To Do Next

If you fail to reject the null hypothesis of White’s test then heteroscedasticity is not present and you can proceed to interpret the output of the original regression.

However, if you reject the null hypothesis, this means heteroscedasticity is present. In this case, the standard errors that are shown in the output table of the regression may be unreliable.

There are two common ways to fix this issue:

1. Transform the response variable.

You can try performing a transformation on the response variable, such as taking of the response variable. This often causes heteroscedasticity to go away.

2. Use weighted regression.

Weighted regression assigns a weight to each data point based on the variance of its fitted value. Essentially, this gives small weights to data points that have higher variances, which shrinks their squared residuals. When the proper weights are used, this can eliminate the problem of heteroscedasticity.

The following tutorials provide additional information about linear regression in Python:

x