What is the difference between glm and lm in R?

The glm function in R is used to fit a generalized linear model, which is an extension of the linear model used in the lm function. The glm function allows for the estimation of models with a non-normal distribution and link function, while the lm function is used for fitting linear models with a normal distribution and identity link function.


The programming language R offers the following functions for fitting linear models:

1. lm – Used to fit linear models

This function uses the following syntax:

lm(formula, data, …)

where:

  • formula: The formula for the linear model (e.g. y ~ x1 + x2)
  • data: The name of the data frame that contains the data

2. glm – Used to fit generalized linear models

This function uses the following syntax:

glm(formula, family=gaussian, data, …)

where:

  • formula: The formula for the linear model (e.g. y ~ x1 + x2)
  • family: The statistical family to use to fit the model. Default is gaussian but other options include binomial, Gamma, and poisson among others.
  • data: The name of the data frame that contains the data

Note that the only difference between these two functions is the family argument included in the glm() function.

If you use lm() or glm() to fit a linear regression model, they will produce the exact same results.

However, the glm() function can also be used to fit more complex models like:

  • (family=binomial)
  • (family=poisson)

The following examples show how to use the lm() function and glm() function in practice.

Example of Using the lm() Function

#fit multiple linear regression model
model <- lm(mpg ~ disp + hp, data=mtcars)

#view model summary
summary(model)

Call:
lm(formula = mpg ~ disp + hp, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.7945 -2.3036 -0.8246  1.8582  6.9363 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.735904   1.331566  23.083  < 2e-16 ***
disp        -0.030346   0.007405  -4.098 0.000306 ***
hp          -0.024840   0.013385  -1.856 0.073679 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.127 on 29 degrees of freedom
Multiple R-squared:  0.7482,	Adjusted R-squared:  0.7309 
F-statistic: 43.09 on 2 and 29 DF,  p-value: 2.062e-09

Examples of Using the glm() Function

The following code shows how to fit the exact same linear regression model using the glm() function:

#fit multiple linear regression model
model <- glm(mpg ~ disp + hp, data=mtcars)

#view model summary
summary(model)

Call:
glm(formula = mpg ~ disp + hp, data = mtcars)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-4.7945  -2.3036  -0.8246   1.8582   6.9363  

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.735904   1.331566  23.083  < 2e-16 ***
disp        -0.030346   0.007405  -4.098 0.000306 ***
hp          -0.024840   0.013385  -1.856 0.073679 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 9.775636)

    Null deviance: 1126.05  on 31  degrees of freedom
Residual deviance:  283.49  on 29  degrees of freedom
AIC: 168.62

Number of Fisher Scoring iterations: 2

Notice that the coefficient estimates and standard errors of the coefficient estimates are the exact same as those produced by the lm() function.

Note that we can also use the glm() function to fit a logistic regression model by specifying family=binomial as follows:

#fit logistic regression model
model <- glm(am ~ disp + hp, data=mtcars, family=binomial)

#view model summary
summary(model)

Call:
glm(formula = am ~ disp + hp, family = binomial, data = mtcars)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9665  -0.3090  -0.0017   0.3934   1.3682  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  1.40342    1.36757   1.026   0.3048  
disp        -0.09518    0.04800  -1.983   0.0474 *
hp           0.12170    0.06777   1.796   0.0725 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 43.230  on 31  degrees of freedom
Residual deviance: 16.713  on 29  degrees of freedom
AIC: 22.713

Number of Fisher Scoring iterations: 8

We can also use the glm() function to fit a Poisson regression model by specifying family=poisson as follows:

#fit Poisson regression model
model <- glm(am ~ disp + hp, data=mtcars, family=poisson)

#view model summary
summary(model)

Call:
glm(formula = am ~ disp + hp, family = poisson, data = mtcars)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.1266  -0.4629  -0.2453   0.1797   1.5428  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)   
(Intercept)  0.214255   0.593463   0.361  0.71808   
disp        -0.018915   0.007072  -2.674  0.00749 **
hp           0.016522   0.007163   2.307  0.02107 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 23.420  on 31  degrees of freedom
Residual deviance: 10.526  on 29  degrees of freedom
AIC: 42.526

Number of Fisher Scoring iterations: 6

x