How do you perform a partial F-test?


A partial F-test is used to determine whether or not there is a statistically significant difference between a regression model and some nested version of the same model.

nested model is simply one that contains a subset of the predictor variables in the overall regression model.

For example, suppose we have the following regression model with four predictor variables:

Y = β0 + β1x1 + β2x2 + β3x3 + β4x4 + ε

One example of a nested model would be the following model with only two of the original predictor variables:

Y = β0 + β1x1 + β2x2 + ε

To determine if these two models are significantly different, we can perform a partial F-test.

Partial F-Test: The Basics

A partial F-test calculates the following F test-statistic:

F = ((RSSreduced – RSSfull)/p)  /  (RSSfull/n-k)

where:

  • RSSreduced: The residual sum of squares of the reduced (i.e. “nested”) model.
  • RSSfull: The residual sum of squares of the full model.
  • p: The number of predictors removed from the full model.
  • n: The total observations in the dataset.
  • k: The number of coefficients (including the intercept) in the full model.

Note that the residual sum of squares will always be smaller for the full model since adding predictors will always lead to some reduction in error.

Thus, a partial F-test essentially tests whether the group of predictors that you removed from the full model are actually useful and need to be included in the full model.

This test uses the following null and alternative hypotheses:

H0: All coefficients removed from the full model are zero.

HA: At least one of the coefficients removed from the full model is non-zero.

Partial F-Test: An Example

In practice, we use the following steps to perform a partial F-test:

1. Fit the full regression model and calculate RSSfull.

2. Fit the nested regression model and calculate RSSreduced.

3. Perform an ANOVA to compare the full and reduced model, which will produce the F test-statistic needed to compare the models.

For example, the following code shows how to fit the following two regression models in R using data from the built-in mtcars dataset:

Full model: mpg = β0 + β1disp + β2carb + β3hp + β4cyl

Reduced model: mpg = β0 + β1disp + β2carb

#fit full model
model_full <- lm(mpg ~ disp + carb + hp + cyl, data = mtcars)

#fit reduced model
model_reduced <- lm(mpg ~ disp + carb, data = mtcars)

#perform ANOVA to test for differences in models
anova(model_reduced, model_full)

Analysis of Variance Table

Model 1: mpg ~ disp + carb
Model 2: mpg ~ disp + carb + hp + cyl
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     29 254.82                           
2     27 238.71  2    16.113 0.9113  0.414

From the output we can see that the F test-statistic from the ANOVA is 0.9113 and the corresponding p-value is 0.414.

Since this p-value is not less than .05, we will fail to reject the null hypothesis. This means we don’t have sufficient evidence to say that either of the predictor variables hp or cyl are statistically significant.

In other words, adding hp and cyl to the regression model do not significantly improve the fit of the model.

x