What is the difference between a Confidence Interval and a Prediction Interval?

A confidence interval is a range of values that is likely to contain the mean of a population with some degree of confidence, usually based on a sample of data. A prediction interval is a range of values that is likely to contain a single future observation. The additional uncertainty associated with predicting a single value, rather than estimating the mean of a population, makes prediction intervals wider than confidence intervals.


Two types of intervals that are often used in regression analysis are confidence intervals and prediction intervals.

Here’s the difference between the two intervals:

Confidence intervals represent a range of values that are likely to contain the true mean value of some response variable based on specific values of one or more predictor variables.

Prediction intervals represent a range of values that are likely to contain the true value of some response variable for a single new observation based on specific values of one or more predictor variables.

For example, suppose we fit a that uses the number of bedrooms to predict the selling price of a house:

Price = β0 + β1(number of bedrooms)

If we’d like to estimate the mean selling price of houses with three bedrooms, we would use a confidence interval.

However, if we’d like to estimate the selling price of a specific new home that just came on the market with three bedrooms, we would use a prediction interval.

Note: Since prediction intervals attempt to create an interval for a specific new observation, there’s more uncertainty in our estimate and thus prediction intervals are always wider than confidence intervals.

Confidence Interval vs. Prediction Interval: Difference in Formulas

We use the following formula to calculate a confidence interval:

ŷ0  +/-  tα/2,n-2 * Syx((x0 – x̄)2/SSx + 1/n)

We use the following formula to calculate a prediction interval:

ŷ0  +/-  tα/2,n-2 * Syx((x0 – x̄)2/SSx + 1/n + 1)

where:

  • ŷ0: Estimated mean value of response variable
  • tα/2,n-2: t-critical value with n-2 degrees of freedom
  • Syx: Standard error of response variable
  • x0: specific value of predictor variable 
  • : mean value of predictor variable
  • SSx: Sum of squares for predictor variable
  • n: Total sample size

Notice that the formula for a prediction interval contains an extra one in the square root portion, which means the standard error will always be larger than a confidence interval.

Example: Interpreting Confidence Intervals vs. Prediction Intervals

Suppose we have the following dataset that shows the number of bedrooms and the selling price for 20 houses in a particular neighborhood:

Now suppose we fit a simple linear regression model to this dataset in R:

#define data
df <- data.frame(beds=c(1, 1, 1, 2, 2, 2, 2, 3, 3, 3,
                        3, 3, 3, 3, 4, 4, 4, 5, 5, 6),
                 price=c(120, 133, 139, 185, 148, 160, 192, 205, 244, 213,
                         236, 280, 275, 273, 312, 311, 304, 415, 396, 488))

#fit simple linear regression model
model <- lm(price~beds, data=df)

#view model fit
summary(model)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   39.450     13.248   2.978  0.00807 ** 
beds          70.667      4.031  17.529 9.26e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 24.19 on 18 degrees of freedom
Multiple R-squared:  0.9447,	Adjusted R-squared:  0.9416 
F-statistic: 307.3 on 1 and 18 DF,  p-value: 9.257e-13

The fitted regression model turns out to be:

Selling price (thousands) = 39.450 + 70.667(number of bedrooms)

We can use the following code to calculate a confidence interval for the mean selling price of houses that have three bedrooms:

#define new house
new <- data.frame(beds=c(3))

#confidence interval for mean selling price of house with 3 bedrooms
predict(model, newdata = new, interval = "confidence")

     fit     lwr     upr
1 251.45 240.087 262.813

The 95% confidence interval for the mean selling price of a house with three bedrooms is [$240k, $262k].

We can then use the following code to calculate a prediction interval for the selling price of a new house that just came on the market that has three bedrooms:

#define new house
new <- data.frame(beds=c(3))

#confidence interval for mean selling price of house with 3 bedrooms
predict(model, newdata = new, interval = "prediction")

     fit      lwr      upr
1 251.45 199.3783 303.5217

The 95% prediction interval for the selling price of a new house with three bedrooms is [$199k, $303k].

Notice that the prediction interval is much wider than the confidence interval because there is more uncertainty around the selling price of a single new house as opposed to the mean selling price of all houses with three bedrooms.

The following tutorials offer additional information about confidence intervals:

The following tutorials offer additional information about prediction intervals:

x