How do I calculate a Confidence Interval for a Regression Intercept?

To calculate a confidence interval for a regression intercept, one would need to use a t-test to calculate the standard error of the estimate of the intercept. This would then allow one to calculate the 95% confidence interval using the t-distribution. This confidence interval would represent the range of values that the intercept is likely to fall within.


Simple linear regression is used to quantify the relationship between a predictor variable and a response variable.

This method finds a line that best “fits” a dataset and takes on the following form:

ŷ = b0 + b1x

where:

  • ŷ: The estimated response value
  • b0: The intercept of the regression line
  • b1: The slope of the regression line
  • x: The value of the predictor variable

Often we’re interested in the value for b1, which tells us the average change in the associated with a one unit increase in the predictor variable.

However, in rare circumstances we’re also interested in the value for b0, which tells us the average value of the response variable when the predictor variable is equal to zero.

We can use the following formula to calculate a confidence interval for the value of β0, the true population intercept:

Confidence Interval for β0: b0 ± tα/2, n-2 * se(b0)

The following example shows how to calculate a confidence interval for an intercept in practice.

Example: Confidence Interval for Regression Intercept

Suppose we’d like to fit a simple linear regression model using hours studied as a predictor variable and exam score as a response variable for 15 students in a particular class:

The following code shows how to fit this simple linear regression model in R:

#create data frame
df <- data.frame(hours=c(1, 2, 4, 5, 5, 6, 6, 7, 8, 10, 11, 11, 12, 12, 14),
                 score=c(64, 66, 76, 73, 74, 81, 83, 82, 80, 88, 84, 82, 91, 93, 89))

#fit simple linear regression model
fit <- lm(score ~ hours, data=df)

#view summary of model
summary(fit)

Call:
lm(formula = score ~ hours, data = df)

Residuals:
   Min     1Q Median     3Q    Max 
-5.140 -3.219 -1.193  2.816  5.772 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   65.334      2.106  31.023 1.41e-13 ***
hours          1.982      0.248   7.995 2.25e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.641 on 13 degrees of freedom
Multiple R-squared:  0.831,	Adjusted R-squared:  0.818 
F-statistic: 63.91 on 1 and 13 DF,  p-value: 2.253e-06

Using the coefficient estimates in the output, we can write the fitted simple linear regression model as:

Score = 65.334 + 1.982*(Hours Studied)

We can use the following formula to calculate a 95% confidence interval for the intercept:

  • 95% C.I. for β0: b0 ± tα/2, n-2 * se(b0)
  • 95% C.I. for β0: 65.334 ± t.05/2, 15-2 * 2.106
  • 95% C.I. for β0: 65.334 ± 2.1604 * 2.106
  • 95% C.I. for β0: [60.78, 69.88]

We interpret this to mean that we’re 95% confident that the true population mean exam score for students who study for zero hours is between 60.78 and 69.88.

Note: We used the to find the t critical value that corresponds to a 95% confidence level with 13 degrees of freedom.

Cautions on Calculating a Confidence Interval for a Regression Intercept

We often don’t calculate a confidence interval for a regression intercept in practice because it usually doesn’t make sense to interpret the value of the intercept in a regression model.

For example, suppose we fit a regression model that uses height of a basketball player as a predictor variable and average points per game as a response variable.

It’s not possible for a player to be zero feet tall, so it wouldn’t make sense to interpret the intercept literally in this model.

There are countless scenarios like this where a predictor variable can’t take on a value of zero so it doesn’t make sense to interpret the intercept value of the model or create a confidence interval for the intercept.

For example, consider the following potential predictor variables in a model:

  • Square footage of a house
  • Length of a car
  • Weight of a person

Each of these predictor variables can’t take on a value of zero, so it wouldn’t make sense to calculate a confidence interval for the intercept of a regression model in any of these circumstances.

The following tutorials provide additional information about linear regression:

x