Table of Contents
A is the difference between an observed value and a predicted value in a regression model.
It is calculated as:
Residual = Observed value – Predicted value
One way to understand how well a regression model fits a dataset is to calculate the residual sum of squares, which is calculated as:
Residual sum of squares = Σ(ei)2
where:
- Σ: A Greek symbol that means “sum”
- ei: The ith residual
The lower the value, the better a model fits a dataset.
We can easily calculate the residual sum of squares for a regression model in R by using one of the following two methods:
#build regression model model <- lm(y ~ x1 + x2 + ..., data = df) #calculate residual sum of squares (method 1) deviance(model) #calculate residual sum of squares (method 2) sum(resid(model)^2)
Both methods will produce the exact same results.
The following example shows how to use these functions in practice.
Example: Calculating Residual Sum of Squares in R
For this example, we’ll use the built-in mtcars dataset in R:
#view first six rows of mtcars dataset
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The following code shows how to fit a multiple linear regression model for this dataset and calculate the residual sum of squares of the model:
#build multiple linear regression model model <- lm(mpg ~ wt + hp, data = mtcars) #calculate residual sum of squares (method 1) deviance(model) [1] 195.0478 #calculate residual sum of squares (method 2) sum(resid(model)^2) [1] 195.0478
If we have two competing models, we can calculate the residual sum of squares for both to determine which one fits the data better:
#build two different models model1 <- lm(mpg ~ wt + hp, data = mtcars) model2 <- lm(mpg ~ wt + disp, data = mtcars) #calculate residual sum of squares for both models deviance(model1) [1] 195.0478 deviance(model2) [1] 246.6825
We can see that the residual sum of squares for model 1 is lower, which indicates that it fits the data better than model 2.
We can confirm this by calculating the of each model:
#build two different models model1 <- lm(mpg ~ wt + hp, data = mtcars) model2 <- lm(mpg ~ wt + disp, data = mtcars) #calculate R-squared for both models summary(model1)$r.squared [1] 0.8267855 summary(model2)$r.squared [1] 0.7809306
The R-squared for model 1 turns out to be higher, which indicates that it’s able to explain more of the variance in the response values compared to model 2.