How to Perform Robust Regression in R (Step-by-Step)

Robust regression is a type of regression analysis that helps to reduce the effects of outliers or incorrect data points in the data set. To perform robust regression in R, a user needs to first install the robustbase package. Once the package is installed, the user must set the model and specify the type of regression. After this, the user can then set the model parameters, such as the number of iterations and the type of regression. Lastly, the user must analyze and interpret the results of the regression.


Robust regression is a method we can use as an alternative to ordinary least squares regression when there are outliers or in the dataset we’re working with.

To perform robust regression in R, we can use the rlm() function from the MASS package, which uses the following syntax:

The following step-by-step example shows how to perform robust regression in R for a given dataset.

Step 1: Create the Data

First, let’s create a fake dataset to work with:

#create data
df <- data.frame(x1=c(1, 3, 3, 4, 4, 6, 6, 8, 9, 3,
                      11, 16, 16, 18, 19, 20, 23, 23, 24, 25),
                 x2=c(7, 7, 4, 29, 13, 34, 17, 19, 20, 12,
                      25, 26, 26, 26, 27, 29, 30, 31, 31, 32),
                  y=c(17, 170, 19, 194, 24, 2, 25, 29, 30, 32,
                      44, 60, 61, 63, 63, 64, 61, 67, 59, 70))

#view first six rows of data
head(df)

  x1 x2   y
1  1  7  17
2  3  7 170
3  3  4  19
4  4 29 194
5  4 13  24
6  6 34   2

Step 2: Perform Ordinary Least Squares Regression

Next, let’s fit an ordinary least squares regression model and create a plot of the .

In practice, we often consider any standardized residual with an absolute value greater than 3 to be an outlier.

#fit ordinary least squares regression model
ols <- lm(y~x1+x2, data=df)

#create plot of y-values vs. standardized residuals
plot(df$y, rstandard(ols), ylab='Standardized Residuals', xlab='y') 
abline(h=0)

From the plot we can see that there are two observations with standardized residuals around 3.

This is an indication that there are two potential outliers in the dataset and thus we may benefit from performing robust regression instead.

Step 3: Perform Robust Regression

Next, let’s use the rlm() function to fit a robust regression model:

library(MASS)

#fit robust regression model
robust <- rlm(y~x1+x2, data=df)

To determine if this robust regression model offers a better fit to the data compared to the OLS model, we can calculate the residual standard error of each model.

The following code shows how to calculate the RSE for each model:

#find residual standard error of ols model
summary(ols)$sigma

[1] 49.41848

#find residual standard error of ols model
summary(robust)$sigma

[1] 9.369349

We can see that the RSE for the robust regression model is much lower than the ordinary least squares regression model, which tells us that the robust regression model offers a better fit to the data.

x