What is the confidence interval for the difference in proportions?

The confidence interval for the difference in proportions is a statistical measure used to estimate the range in which the true difference between two proportions lies, with a specified level of confidence. It is used to determine the likelihood that the difference between two proportions is significant and not due to chance. This interval is calculated by using sample data from two groups and considering the margin of error, sample size, and level of confidence. It is commonly used in research and data analysis to compare the proportions of two groups and make inferences about the population. The confidence interval for the difference in proportions helps to provide a more accurate and reliable understanding of the differences between two groups.

Confidence Interval for the Difference in Proportions


confidence interval (C.I.) for a difference in proportions is a range of values that is likely to contain the true difference between two population proportions with a certain level of confidence.

This tutorial explains the following:

  • The motivation for creating this confidence interval.
  • The formula to create this confidence interval.
  • An example of how to calculate this confidence interval.
  • How to interpret this confidence interval.

C.I. for the Difference in Proportions: Motivation

Often researchers are interested in estimating the difference between two population proportions. To estimate this difference, they’ll go out and gather a random sample from each population and calculate the proportion for each sample. Then, they can compare the difference between the two proportions.

However, they can’t know for sure if the difference in the sample proportons matches the true difference in the population proportions which is why they may create a for the difference between the two proportions. This provides a range of values that is likely to contain the true difference between the population proportions.

For example, suppose we want to estimate the difference in the proportion of residents who support a certain law in county A compared to the proportion who support the law in county B.

Since there are thousands of residents in each county, it would take too long and be too costly to go around and survey every individual resident in each county.

Instead, we might take a of residents from each county and use the proportion in favor of the law in each sample to estimate the true difference in proportions between the two counties:

Since our samples are random, the difference in proportions between the two samples is not guaranteed to exactly match the difference in proportions between the two populations. So, to capture this uncertainty we can create a confidence interval that contains a range of values that are likely to contain the true difference in proportions between the two populations.

C.I. for the Difference in Proportions: Formula

We use the following formula to calculate a confidence interval for a difference between two population proportions:

Confidence interval = (p1–p2)  +/-  z*√(p1(1-p1)/n+ p2(1-p2)/n2)

where:

  • p1, p2: sample 1 proportion, sample 2 proportion
  • z: the z-critical value based on the confidence level
  • n1, n2: sample 1 size, sample 2 size

The z-value that you will use is dependent on the confidence level that you choose. The following table shows the z-value that corresponds to popular confidence level choices:

Confidence Level z-value
0.90 1.645
0.95 1.96
0.99 2.58

Notice that higher confidence levels correspond to larger z-values, which leads to wider confidence intervals. This means that, for example, a 95% confidence interval will be wider than a 90% confidence interval for the same set of data.

C.I. for the Difference in Proportions: Example

Suppose we want to estimate the difference in the proportion of residents who support a certain law in county A compared to the proportion who support the law in county B. Here is the summary data for each sample:

Sample 1:

  • n1 = 100
  • p1 = 0.62 (i.e. 62 out of 100 residents support the law)

Sample 2:

  • n2 = 100
  • p2 = 0.46 (i.e. 46 our of 100 residents support the law)

Here is how to find various confidence intervals for the difference in population proportions:

90% Confidence Interval:

(.62-.46) +/- 1.645*√(.62(1-.62)/100 + .46(1-.46)/100) =  [.0456, .2744]

95% Confidence Interval:

(.62-.46) +/- 1.96*√(.62(1-.62)/100 + .46(1-.46)/100) =  [.0236, .2964]

99% Confidence Interval:

(.62-.46) +/- 2.58*√(.62(1-.62)/100 + .46(1-.46)/100) =  [-0.0192, 0.3392]

Note: You can also find these confidence intervals by using the .

C.I. for the Difference in Proportions: Interpretation

The way we would interpret a confidence interval is as follows:

There is a 95% chance that the confidence interval of [.0236, .2964] contains the true difference in the proportion of residents who favor the law between the two counties.

Since this interval does not contain the value “0” it means that it’s highly likely that there is a true difference in the proportion of residents who support this law in County A compared to county B.

x