How to perform a Durbin-Watson Test in Python

The Durbin-Watson Test is an inferential statistic used to detect autocorrelation in the residuals of a regression model in Python. It can be performed using the statsmodels.stats.stattools.durbin_watson() function which takes the residuals from a regression as an input and returns a statistic ranging from 0 to 4 where values close to 2 indicate no autocorrelation and values close to 0 or 4 indicate positive and negative autocorrelation respectively.


One of the  is that there is no correlation between the residuals. In other words, the residuals are assumed to be independent.

One way to determine if this assumption is met is to perform a , which is used to detect the presence of autocorrelation in the residuals of a . This test uses the following hypotheses:

H0 (null hypothesis): There is no correlation among the residuals.

HA (alternative hypothesis): The residuals are autocorrelated.

The test statistic is approximately equal to 2*(1-r) where r is the sample autocorrelation of the residuals. Thus, the test statistic will always be between 0 and 4 with the following interpretation:

  • A test statistic of indicates no serial correlation.
  • The closer the test statistics is to 0, the more evidence of positive serial correlation.
  • The closer the test statistics is to 4, the more evidence of negative serial correlation.

As a rule of thumb, test statistic values between the range of 1.5 and 2.5 are considered normal. However, values outside of this range could indicate that autocorrelation is a problem.

This tutorial explains how to perform a Durbin-Watson test in Python.

Example: Durbin-Watson Test in Python

Suppose we have the following dataset that describes the attributes of 10 basketball players:

import numpy as np
import pandas as pd

#create dataset
df = pd.DataFrame({'rating': [90, 85, 82, 88, 94, 90, 76, 75, 87, 86],
                   'points': [25, 20, 14, 16, 27, 20, 12, 15, 14, 19],
                   'assists': [5, 7, 7, 8, 5, 7, 6, 9, 9, 5],
                   'rebounds': [11, 8, 10, 6, 6, 9, 6, 10, 10, 7]})

#view dataset
df

	rating	points	assists	rebounds
0	90	25	5	11
1	85	20	7	8
2	82	14	7	10
3	88	16	8	6
4	94	27	5	6
5	90	20	7	9
6	76	12	6	6
7	75	15	9	10
8	87	14	9	10
9	86	19	5	7

Suppose we fit a multiple linear regression model using rating as the response variable and the other three variables as the predictor variables:

from statsmodels.formula.api import ols

#fit multiple linear regression model
model = ols('rating ~ points + assists + rebounds', data=df).fit()

#view model summary
print(model.summary())

We can perform a Durbin Watson using the from the statsmodels library to determine if the residuals of the regression model are autocorrelated:

from statsmodels.stats.stattools import durbin_watson

#perform Durbin-Watson test
durbin_watson(model.resid)

2.392

The test statistic is 2.392. Since this is within the range of 1.5 and 2.5, we would consider autocorrelation not to be problematic in this regression model.

How to Handle Autocorrelation

1. For positive serial correlation, consider adding lags of the dependent and/or independent variable to the model.

2. For negative serial correlation, check to make sure that none of your variables are overdifferenced.

3. For seasonal correlation, consider adding seasonal dummy variables to the model.

x