Is there a function in Python to calculate Standardized Residuals?

No, there is no built-in function in Python to calculate Standardized Residuals. However, a few packages such as statsmodels and SciPy provide ways to calculate Standardized Residuals. These packages provide functions and classes that can be used to calculate the standardized residuals.


A residual is the difference between an observed value and a predicted value in a regression model.

It is calculated as:

Residual = Observed value – Predicted value

If we plot the observed values and overlay the fitted regression line, the residuals for each would be the vertical distance between the observation and the regression line:

Example of residual in statistics

One type of residual we often use to identify outliers in a regression model is known as a standardized residual.

It is calculated as:

ri  =  ei / s(ei)  =  ei / RSE√1-hii

where:

  • ei: The ith residual
  • RSE: The residual standard error of the model
  • hii: The leverage of the ith observation

In practice, we often consider any standardized residual with an absolute value greater than 3 to be an outlier.

This tutorial provides a step-by-step example of how to calculate standardized residuals in Python.

Step 1: Enter the Data

First, we’ll create a small dataset to work with in Python:

import pandas as pd

#create dataset
df = pd.DataFrame({'x': [8, 12, 12, 13, 14, 16, 17, 22, 24, 26, 29, 30],
                   'y': [41, 42, 39, 37, 35, 39, 45, 46, 39, 49, 55, 57]})

Step 2: Fit the Regression Model

Next, we’ll fit a :

import statsmodels.api as sm

#define response variable
y = df['y']

#define explanatory variable
x = df['x']

#add constant to predictor variables
x = sm.add_constant(x)

#fit linear regression model
model = sm.OLS(y, x).fit() 

Step 3: Calculate the Standardized Residuals

Next, we’ll calculate the standardized residuals of the model:

#create instance of influence
influence = model.get_influence()

#obtain standardized residuals
standardized_residuals = influence.resid_studentized_internal

#display standardized residuals
print(standardized_residuals)

[ 1.40517322  0.81017562  0.07491009 -0.59323342 -1.2482053  -0.64248883
  0.59610905 -0.05876884 -2.11711982 -0.066556    0.91057211  1.26973888]

From the results we can see that none of the standardized residuals exceed an absolute value of 3. Thus, none of the observations appear to be outliers.

Step 4: Visualize the Standardized Residuals

Lastly, we can create a scatterplot to visualize the values for the predictor variable vs. the standardized residuals:

import matplotlib.pyplot as plt

plt.scatter(df.x, standardized_residuals)
plt.xlabel('x')
plt.ylabel('Standardized Residuals')
plt.axhline(y=0, color='black', linestyle='--', linewidth=1)
plt.show()

Plot of standardized residuals in Python

What Are Standardized Residuals?

x