Table of Contents
Confidence intervals are a statistical tool used to estimate the range of values within which a population parameter is likely to fall. In R, confidence intervals can be found using various functions such as “confint()” and “t.test()”. These functions take a sample of data and calculate the confidence interval based on a specified confidence level. For example, a 95% confidence interval would indicate that there is a 95% chance that the true population parameter falls within the calculated range. This tool is commonly used in hypothesis testing, where the confidence interval can help determine if a null hypothesis can be rejected or not. Additionally, confidence intervals can also be helpful in analyzing data in fields such as market research, social sciences, and medical studies. Overall, confidence intervals in R provide a reliable and versatile method for estimating population parameters and making data-driven decisions.
Find Confidence Intervals in R (With Examples)
A confidence interval is a range of values that is likely to contain a population parameter with a certain level of confidence.
It is calculated using the following general formula:
Confidence Interval = (point estimate) +/- (critical value)*(standard error)
This formula creates an interval with a lower bound and an upper bound, which likely contains a population parameter with a certain level of confidence:
Confidence Interval = [lower bound, upper bound]
This tutorial explains how to calculate the following confidence intervals in R:
1. Confidence Interval for a Mean
2. Confidence Interval for a Difference in Means
3. Confidence Interval for a Proportion
4. Confidence Interval for a Difference in Proportions
Let’s jump in!
Example 1: Confidence Interval for a Mean
We use the following formula to calculate a confidence interval for a mean:
Confidence Interval = x +/- tn-1, 1-α/2*(s/√n)
where:
- x: sample mean
- t: the t-critical value
- s: sample standard deviation
- n: sample size
Example: Suppose we collect a random sample of turtles with the following information:
- Sample size n = 25
- Sample mean weight x = 300
- Sample standard deviation s = 18.5
The following code shows how to calculate a 95% confidence interval for the true population mean weight of turtles:
#input sample size, sample mean, and sample standard deviation n <- 25 xbar <- 300 s <- 18.5 #calculate margin of error margin <- qt(0.975,df=n-1)*s/sqrt(n) #calculate lower and upper bounds of confidence interval low <- xbar - margin low [1] 292.3636 high <- xbar + margin high [1] 307.6364
The 95% confidence interval for the true population mean weight of turtles is [292.36, 307.64].
Example 2: Confidence Interval for a Difference in Means
We use the following formula to calculate a confidence interval for a difference in population means:
Confidence interval = (x1–x2) +/- t*√((sp2/n1) + (sp2/n2))
where:
- x1, x2: sample 1 mean, sample 2 mean
- t: the t-critical value based on the confidence level and (n1+n2-2) degrees of freedom
- sp2: pooled variance, calculated as ((n1-1)s12 + (n2-1)s22) / (n1+n2-2)
- t: the t-critical value
- n1, n2: sample 1 size, sample 2 size
Example: Suppose we want to estimate the difference in mean weight between two different species of turtles, so we go out and gather a random sample of 15 turtles from each population. Here is the summary data for each sample:
Sample 1:
- x1 = 310
- s1 = 18.5
- n1 = 15
Sample 2:
- x2 = 300
- s2 = 16.4
- n2 = 15
The following code shows how to calculate a 95% confidence interval for the true difference in population means:
#input sample size, sample mean, and sample standard deviation n1 <- 15 xbar1 <- 310 s1 <- 18.5 n2 <- 15 xbar2 <- 300 s2 <- 16.4 #calculate pooled variance sp = ((n1-1)*s1^2 + (n2-1)*s2^2) / (n1+n2-2) #calculate margin of error margin <- qt(0.975,df=n1+n2-1)*sqrt(sp/n1 + sp/n2) #calculate lower and upper bounds of confidence interval low <- (xbar1-xbar2) - margin low [1] -3.055445 high <- (xbar1-xbar2) + margin high [1] 23.05544
The 95% confidence interval for the true difference in population means is [-3.06, 23.06].
Example 3: Confidence Interval for a Proportion
We use the following formula to calculate a confidence interval for a proportion:
Confidence Interval = p +/- z*(√p(1-p) / n)
where:
- p: sample proportion
- z: the chosen z-value
- n: sample size
Example: Suppose we want to estimate the proportion of residents in a county that are in favor of a certain law. We select a random sample of 100 residents and ask them about their stance on the law. Here are the results:
- Sample size n = 100
- Proportion in favor of law p = 0.56
The following code shows how to calculate a 95% confidence interval for the true proportion of residents in the entire county who are in favor of the law:
#input sample size and sample proportion n <- 100 p <- .56 #calculate margin of error margin <- qnorm(0.975)*sqrt(p*(1-p)/n) #calculate lower and upper bounds of confidence interval low <- p - margin low [1] 0.4627099 high <- p + margin high [1] 0.6572901
The 95% confidence interval for the true proportion of residents in the entire county who are in favor of the law is [.463, .657].
Example 4: Confidence Interval for a Difference in Proportions
We use the following formula to calculate a confidence interval for a difference in proportions:
Confidence interval = (p1–p2) +/- z*√(p1(1-p1)/n1 + p2(1-p2)/n2)
where:
- p1, p2: sample 1 proportion, sample 2 proportion
- z: the z-critical value based on the confidence level
- n1, n2: sample 1 size, sample 2 size
Example: Suppose we want to estimate the difference in the proportion of residents who support a certain law in county A compared to the proportion who support the law in county B. Here is the summary data for each sample:
Sample 1:
- n1 = 100
- p1 = 0.62 (i.e. 62 out of 100 residents support the law)
Sample 2:
- n2 = 100
- p2 = 0.46 (i.e. 46 our of 100 residents support the law)
The following code shows how to calculate a 95% confidence interval for the true difference in proportion of residents who support the law between the counties:
#input sample sizes and sample proportions n1 <- 100 p1 <- .62 n2 <- 100 p2 <- .46 #calculate margin of error margin <- qnorm(0.975)*sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2) #calculate lower and upper bounds of confidence interval low <- (p1-p2) - margin low [1] 0.02364509 high <- (p1-p2) + margin high [1] 0.2963549
The 95% confidence interval for the true difference in proportion of residents who support the law between the counties is [.024, .296].
You can find more R tutorials here.