What is the difference between aov() and anova() in R?

The aov() function in R is used to perform a type I one-way analysis of variance, while the anova() function can perform both type I and type II one-way ANOVA tests. The type I ANOVA test is used when all the groups have the same sample size, while the type II ANOVA test can be used when the groups have different sample sizes.


The aov() and anova() functions in R seem similar, but we actually use them in two different scenarios.

We use aov() when we would like to fit an ANOVA model and view the results in an ANOVA summary table.

We use anova() when we would like to compare the fit of nested regression models to determine if a regression model with a certain set of coefficients offers a significantly better fit than a model with only a subset of the coefficients.

The following examples show how to use each function in practice.

Example 1: How to Use aov() in R

Suppose we would like to perform a to determine if three different exercise programs impact weight loss differently.

We recruit 90 people to participate in an experiment in which we randomly assign 30 people to follow either program A, program B, or program C for one month.

The following code shows how to use the aov() function in R to perform this one-way ANOVA:

#make this example reproducible
set.seed(0)

#create data frame
df <- data.frame(program = rep(c("A", "B", "C"), each=30),
                 weight_loss = c(runif(30, 0, 3),
                                 runif(30, 0, 5),
                                 runif(30, 1, 7)))

#fit one-way anova using aov()
fit <- aov(weight_loss ~ program, data=df)

#view results
summary(fit)

            Df Sum Sq Mean Sq F value   Pr(>F)    
program      2  98.93   49.46   30.83 7.55e-11 ***
Residuals   87 139.57    1.60                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the model output we can see that the p-value for program (.0000000000755) is less than .05, which means there is a statistically significant difference in mean weight loss between the three programs.

Example 2: How to Use anova() in R

Suppose we would like to use number of hours studied to predict exam score for students at a certain college. We may decide to fit the following two regression models:

Full Model: Score = β0 + B1(hours) + B2(hours)2

Reduced Model: Score = β0 + B1(hours)

The following code shows how to use the anova() function in R to perform a lack of fit test to determine if the full model offers a significantly better fit than the reduced model:

#make this example reproducible
set.seed(1)

#create dataset
df <- data.frame(hours = runif(50, 5, 15), score=50)
df$score = df$score + df$hours^3/150 + df$hours*runif(50, 1, 2)

#view head of data
head(df)

      hours    score
1  7.655087 64.30191
2  8.721239 70.65430
3 10.728534 73.66114
4 14.082078 86.14630
5  7.016819 59.81595
6 13.983897 83.60510

#fit full model
full <- lm(score ~ poly(hours,2), data=df)

#fit reduced model
reduced <- lm(score ~ hours, data=df)

#perform lack of fit test using anova()
anova(full, reduced)

Analysis of Variance Table

Model 1: score ~ poly(hours, 2)
Model 2: score ~ hours
  Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
1     47 368.48                                
2     48 451.22 -1   -82.744 10.554 0.002144 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p-value in the output table (.002144) is less than .05, we can reject the null hypothesis of the test and conclude that the full model offers a statistically significantly better fit than the reduced model.

The following tutorials explain how to perform other common tasks in R:

x