How can I perform bootstrapping in R, and what are some examples of its implementation?

Bootstrapping is a statistical resampling technique used to estimate the variability and uncertainty of a population parameter by repeatedly sampling from a given dataset. In R, bootstrapping can be performed using the “boot” package, which allows for the creation of a large number of resampled datasets from the original data. These resampled datasets can then be used to calculate the desired statistic and generate a confidence interval for its true value. Some examples of bootstrapping in R include estimating the mean or median of a population, creating a confidence interval for a regression coefficient, or comparing the means of two groups. Bootstrapping is a useful tool in situations where traditional statistical methods may not be appropriate, such as when the underlying data is not normally distributed or when the sample size is small. It is a versatile and powerful technique that can provide valuable insights into the uncertainty of our data.

Perform Bootstrapping in R (With Examples)


Bootstrapping is a method that can be used to estimate the standard error of any statistic and produce a confidence interval for the statistic.

The basic process for bootstrapping is as follows:

  • Take k repeated samples with replacement from a given dataset.
  • For each sample, calculate the statistic you’re interested in.
  • This results in k different estimates for a given statistic, which you can then use to calculate the standard error of the statistic and create a confidence interval for the statistic.

We can perform bootstrapping in R by using the following functions from the boot library:

1. Generate bootstrap samples.

boot(data, statistic, R, …)

where:

  • data: A vector, matrix, or data frame
  • statistic: A function that produces the statistic(s) to be bootstrapped
  • R: Number of bootstrap replicates 

2. Generate a bootstrapped confidence interval.

boot.ci(bootobject, conf, type)

where:

  • bootobject: An object returned by the boot() function
  • conf: The confidence interval to calculate. Default is 0.95
  • type: Type of confidence interval to calculate. Options include “norm”, “basic”, “stud”, “perc”, “bca” and “all” – Default is “all”

The following examples show how to use these functions in practice.

Example 1: Bootstrap a Single Statistic

The following code shows how to calculate the standard error for the R-squared of a simple linear regression model:

set.seed(0)
library(boot)

#define function to calculate R-squared
rsq_function <- function(formula, data, indices) {
  d <- data[indices,] #allows boot to select sample
  fit <- lm(formula, data=d) #fit regression modelreturn(summary(fit)$r.square) #return R-squared of model
}
#perform bootstrapping with 2000 replications
reps <- boot(data=mtcars, statistic=rsq_function, R=2000, formula=mpg~disp)

#view results of boostrapping
reps

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = mtcars, statistic = rsq_function, R = 2000, formula = mpg ~ 
    disp)


Bootstrap Statistics :
     original      bias    std. error
t1* 0.7183433 0.002164339  0.06513426

From the results we can see:

  • The estimated R-squared for this regression model is 0.7183433.
  • The standard error for this estimate is 0.06513426.

We can quickly view the distribution of the bootstrapped samples as well:

plot(reps)

Histogram of bootstrapped samples in R

We can also use the following code to calculate the 95% confidence interval for the estimated R-squared of the model:

#calculate adjusted bootstrap percentile (BCa) interval
boot.ci(reps, type="bca")

CALL : 
boot.ci(boot.out = reps, type = "bca")

Intervals : 
Level       BCa          
95%   ( 0.5350,  0.8188 )  
Calculations and Intervals on Original Scale

From the output we can see that the 95% bootstrapped confidence interval for the true R-squared values is (.5350, .8188).

Example 2: Bootstrap Multiple Statistics

The following code shows how to calculate the standard error for each coefficient in a multiple linear regression model:

set.seed(0)
library(boot)

#define function to calculate fitted regression coefficients
coef_function <- function(formula, data, indices) {
  d <- data[indices,] #allows boot to select sample
  fit <- lm(formula, data=d) #fit regression modelreturn(coef(fit)) #return coefficient estimates of model
}

#perform bootstrapping with 2000 replications
reps <- boot(data=mtcars, statistic=coef_function, R=2000, formula=mpg~disp)

#view results of boostrapping
reps

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = mtcars, statistic = coef_function, R = 2000, formula = mpg ~ 
    disp)


Bootstrap Statistics :
       original        bias    std. error
t1* 29.59985476 -5.058601e-02  1.49354577
t2* -0.04121512  6.549384e-05  0.00527082

From the results we can see:

  • The estimated coefficient for the intercept of the model is 29.59985476 and the standard error of this estimate is 1.49354577.
  • The estimated coefficient for the predictor variable disp in the model is -0.04121512 and the standard error of this estimate is 0.00527082.

We can quickly view the distribution of the bootstrapped samples as well:

plot(reps, index=1) #intercept of model
plot(reps, index=2) #disp predictor variable

Bootstrapping in R

We can also use the following code to calculate the 95% confidence intervals for each coefficient:

#calculate adjusted bootstrap percentile (BCa) intervals
boot.ci(reps, type="bca", index=1) #intercept of model
boot.ci(reps, type="bca", index=2) #disp predictor variable

CALL : 
boot.ci(boot.out = reps, type = "bca", index = 1)

Intervals : 
Level       BCa          
95%   (26.78, 32.66 )  
Calculations and Intervals on Original Scale
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 2000 bootstrap replicates

CALL : 
boot.ci(boot.out = reps, type = "bca", index = 2)

Intervals : 
Level       BCa          
95%   (-0.0520, -0.0312 )  
Calculations and Intervals on Original Scale

From the output we can see that the 95% bootstrapped confidence intervals for the model coefficients are as follows:

  • C.I. for intercept: (26.78, 32.66)
  • C.I. for disp: (-.0520, -.0312)

Additional Resources

How to Perform Simple Linear Regression in R
How to Perform Multiple Linear Regression in R
Introduction to Confidence Intervals

x