Table of Contents

Standardized residuals are a type of residual that allows for the comparison of residuals across different observations. They are calculated by taking the residual of an observation, and dividing it by the standard deviation of all the residuals. This makes the residuals easier to interpret, as they are all in the same scale.

A residual is the difference between an observed value and a predicted value in a regression model.

It is calculated as:

Residual = Observed value – Predicted value

If we plot the observed values and overlay the fitted regression line, the residuals for each would be the vertical distance between the observation and the regression line:

One type of residual we often use to identify outliers in a regression model is known as a standardized residual.

It is calculated as:

r_i = e_i / s(e_i) = e_i / RSE√1-h_ii

where:

e_i: The i^th residual
RSE: The residual standard error of the model
h_ii: The leverage of the i^th observation

In practice, we often consider any standardized residual with an absolute value greater than 3 to be an outlier.

This doesn’t necessarily mean that we’ll remove these observations from the model, but we should at least investigate them further to verify that they’re not a result of a data entry error or some other odd occurrence.

Note: Sometimes standardized residuals are also referred to as “internally studentized residuals.”

Example: How to Calculate Standardized Residuals

Suppose we have the following dataset with 12 total observations:

If we use some statistical software (like R, Excel, Python, Stata, etc.) to fit a linear regression line to this dataset, we’ll find that the line of best fit turns out to be:

Using this line, we can calculate the predicted value for each Y value based on the value of X. For example, the predicted value of the first observation would be:

y = 29.63 + 0.7553*(8) = 35.67

We can then calculate the residual for this observation as:

Residual = Observed value – Predicted value = 41 – 35.67 = 5.33

We can repeat this process to find the residual for every single observation:

We can also use statistical software to find that the residual standard error of the model is 4.44.

And, although it’s beyond the scope of this tutorial, we can use software to find the leverage statistic (h_ii) for each observation:

We can then use the following formula to calculate the standardized residual for each observation:

r_i = e_i / RSE√1-h_ii

For example, the standardized residual for the first observation is calculated as:

r_i = 5.33 / 4.44√1-.27 = 1.404

We can repeat this process to find the standardized residual for each observation:

We can then create a quick scatterplot of the predictor values vs. standardized residuals to visually see if any of the standardized residuals exceed an absolute value threshold of 3:

From the plot we can see that none of the standardized residuals exceed an absolute value of 3. Thus, none of the observations appear to be outliers.

It’s worth noting in some cases that researchers consider observations with standardized residuals that exceed an absolute value of 2 to be considered outliers.

It’s up to you to decide, depending on the field you’re working in and the specific problem you’re working on, whether to use an absolute value of 2 or 3 as the threshold for outliers.

The following tutorials provide additional information about standardized residuals:

What are standardized residuals?

Example: How to Calculate Standardized Residuals

Requst a

Scale

Example: How to Calculate Standardized Residuals

Related terms:

Requst a

Scale