How to Calculate Residual Sum of Squares in R


A is the difference between an observed value and a predicted value in a regression model.

It is calculated as:

Residual = Observed value – Predicted value

One way to understand how well a regression model fits a dataset is to calculate the residual sum of squares, which is calculated as:

Residual sum of squares = Σ(ei)2

where:

  • Σ: A Greek symbol that means “sum”
  • ei: The ith residual

The lower the value, the better a model fits a dataset.

We can easily calculate the residual sum of squares for a regression model in R by using one of the following two methods:

#build regression model
model <- lm(y ~ x1 + x2 + ..., data = df)

#calculate residual sum of squares (method 1)
deviance(model)

#calculate residual sum of squares (method 2)
sum(resid(model)^2)

Both methods will produce the exact same results.

The following example shows how to use these functions in practice.

Example: Calculating Residual Sum of Squares in R

For this example, we’ll use the built-in mtcars dataset in R:

#view first six rows of mtcars dataset
head(mtcars)

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

The following code shows how to fit a multiple linear regression model for this dataset and calculate the residual sum of squares of the model:

#build multiple linear regression model
model <- lm(mpg ~ wt + hp, data = mtcars)

#calculate residual sum of squares (method 1)
deviance(model)

[1] 195.0478

#calculate residual sum of squares (method 2)
sum(resid(model)^2)

[1] 195.0478

If we have two competing models, we can calculate the residual sum of squares for both to determine which one fits the data better:

#build two different models
model1 <- lm(mpg ~ wt + hp, data = mtcars)
model2 <- lm(mpg ~ wt + disp, data = mtcars)

#calculate residual sum of squares for both models
deviance(model1)

[1] 195.0478

deviance(model2)

[1] 246.6825 

We can see that the residual sum of squares for model 1 is lower, which indicates that it fits the data better than model 2.

We can confirm this by calculating the of each model:

#build two different models
model1 <- lm(mpg ~ wt + hp, data = mtcars)
model2 <- lm(mpg ~ wt + disp, data = mtcars)

#calculate R-squared for both models
summary(model1)$r.squared

[1] 0.8267855
summary(model2)$r.squared

[1] 0.7809306

The R-squared for model 1 turns out to be higher, which indicates that it’s able to explain more of the variance in the response values compared to model 2.

x