How to perform a Lack of Fit Test in R (Step-by-Step)

To perform a Lack of Fit Test in R, first define the model of interest, then fit the model to the data, and use the anova() function to compare the full model to a reduced model. The reduced model should be a subset of the full model, and should not include any terms that are part of the lack of fit test. Finally, calculate the Lack of Fit Test statistic and compare it to the F critical value to determine if the lack of fit is significant.


A lack of fit test is used to determine whether or not a full offers a significantly better fit to a dataset than some reduced version of the model.

For example, suppose we would like to use number of hours studied to predict exam score for students at a certain college. We may decide to fit the following two regression models:

Full Model: Score = β0 + B1(hours) + B2(hours)2

Reduced Model: Score = β0 + B1(hours)

The following step-by-step example shows how to perform a lack of fit test in R to determine if the full model offers a significantly better fit than the reduced model.

Step 1: Create & Visualize a Dataset

First, we’ll use the following code to create a dataset that contains the number of hours studied and exam score received for 50 students:

#make this example reproducible
set.seed(1)

#create dataset
df <- data.frame(hours = runif(50, 5, 15), score=50)
df$score = df$score + df$hours^3/150 + df$hours*runif(50, 1, 2)

#view first six rows of data
head(df)

      hours    score
1  7.655087 64.30191
2  8.721239 70.65430
3 10.728534 73.66114
4 14.082078 86.14630
5  7.016819 59.81595
6 13.983897 83.60510

Next, we’ll create a scatterplot to visualize the relationship between hours and score:

#load ggplot2 visualization package
library(ggplot2)

#create scatterplot
ggplot(df, aes(x=hours, y=score)) +
  geom_point()

Step 2: Fit Two Different Models to the Dataset

Next, we’ll fit two different regression models to the dataset:

#fit full model
full <- lm(score ~ poly(hours,2), data=df)

#fit reduced model
reduced <- lm(score ~ hours, data=df) 

Step 3: Perform a Lack of Fit Test

Next, we’ll use the anova() command to perform a lack of fit test between the two models:

#lack of fit test
anova(full, reduced)

Analysis of Variance Table

Model 1: score ~ poly(hours, 2)
Model 2: score ~ hours
  Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
1     47 368.48                                
2     48 451.22 -1   -82.744 10.554 0.002144 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Step 4: Visualize the Final Model

Lastly, we can visualize the final model (the full model) relative to the original dataset:

ggplot(df, aes(x=hours, y=score)) + 
          geom_point() +
          stat_smooth(method='lm', formula = y ~ poly(x,2), size = 1) + 
          xlab('Hours Studied') +
          ylab('Score')

Visualizing lack of fit in R

We can see that the curve of the model fits the data quite well.

How to Perform Polynomial Regression in R

x