What is a Confidence Interval for the Difference in Proportions?

A confidence interval for the difference in proportions is a range of values that is likely to include the true difference between two proportions with a certain level of confidence. It is used to quantify the uncertainty in the estimated difference between two proportions, and is calculated by taking the difference between two proportions and adding and subtracting the margin of error. The margin of error is determined by the sample size, the confidence level, and the standard error of the difference between the two proportions.


confidence interval (C.I.) for a difference in proportions is a range of values that is likely to contain the true difference between two population proportions with a certain level of confidence.

This tutorial explains the following:

  • The motivation for creating this confidence interval.
  • The formula to create this confidence interval.
  • An example of how to calculate this confidence interval.
  • How to interpret this confidence interval.

C.I. for the Difference in Proportions: Motivation

Often researchers are interested in estimating the difference between two population proportions. To estimate this difference, they’ll go out and gather a random sample from each population and calculate the proportion for each sample. Then, they can compare the difference between the two proportions.

However, they can’t know for sure if the difference in the sample proportons matches the true difference in the population proportions which is why they may create a for the difference between the two proportions. This provides a range of values that is likely to contain the true difference between the population proportions.

For example, suppose we want to estimate the difference in the proportion of residents who support a certain law in county A compared to the proportion who support the law in county B.

Since there are thousands of residents in each county, it would take too long and be too costly to go around and survey every individual resident in each county.

Instead, we might take a of residents from each county and use the proportion in favor of the law in each sample to estimate the true difference in proportions between the two counties:

Since our samples are random, the difference in proportions between the two samples is not guaranteed to exactly match the difference in proportions between the two populations. So, to capture this uncertainty we can create a confidence interval that contains a range of values that are likely to contain the true difference in proportions between the two populations.

C.I. for the Difference in Proportions: Formula

We use the following formula to calculate a confidence interval for a difference between two population proportions:

Confidence interval = (p1–p2)  +/-  z*√(p1(1-p1)/n+ p2(1-p2)/n2)

where:

  • p1, p2: sample 1 proportion, sample 2 proportion
  • z: the z-critical value based on the confidence level
  • n1, n2: sample 1 size, sample 2 size

The z-value that you will use is dependent on the confidence level that you choose. The following table shows the z-value that corresponds to popular confidence level choices:

Confidence Level z-value
0.90 1.645
0.95 1.96
0.99 2.58

Notice that higher confidence levels correspond to larger z-values, which leads to wider confidence intervals. This means that, for example, a 95% confidence interval will be wider than a 90% confidence interval for the same set of data.

C.I. for the Difference in Proportions: Example

Suppose we want to estimate the difference in the proportion of residents who support a certain law in county A compared to the proportion who support the law in county B. Here is the summary data for each sample:

Sample 1:

  • n1 = 100
  • p1 = 0.62 (i.e. 62 out of 100 residents support the law)

Sample 2:

  • n2 = 100
  • p2 = 0.46 (i.e. 46 our of 100 residents support the law)

Here is how to find various confidence intervals for the difference in population proportions:

90% Confidence Interval:

(.62-.46) +/- 1.645*√(.62(1-.62)/100 + .46(1-.46)/100) =  [.0456, .2744]

95% Confidence Interval:

(.62-.46) +/- 1.96*√(.62(1-.62)/100 + .46(1-.46)/100) =  [.0236, .2964]

99% Confidence Interval:

(.62-.46) +/- 2.58*√(.62(1-.62)/100 + .46(1-.46)/100) =  [-0.0192, 0.3392]

Note: You can also find these confidence intervals by using the .

C.I. for the Difference in Proportions: Interpretation

The way we would interpret a confidence interval is as follows:

There is a 95% chance that the confidence interval of [.0236, .2964] contains the true difference in the proportion of residents who favor the law between the two counties.

Since this interval does not contain the value “0” it means that it’s highly likely that there is a true difference in the proportion of residents who support this law in County A compared to county B.

x