How do I perform a statistical comparison of two datasets?


Often you may want to perform a statistical comparison of two datasets in Excel to understand how the distribution of values in each dataset differs.

There are two common ways to perform a statistical comparison:

Method 1: Calculate the Five Number Summary of Each Dataset

We can can calculate the five number summary of each dataset, which consists of the following values:

  • The minimum value
  • The first quartile (25th percentile)
  • The median (50th percentile)
  • The third quartile (75th percentile)
  • The maximum

By calculating these five values, we can gain a good understanding of the distribution of values in each dataset.

Method 2: Calculate the Average and Standard Deviation

A simpler way to perform a statistical comparison of two datasets is to calculate the average and standard deviation of each dataset.

This helps us understand roughly where the “center” value is located and how spread out the values are in each dataset.

The following example shows how to use each of these methods in practice.

Example: Perform Statistical Comparison of Two Datasets in Excel

Suppose we have two datasets in Excel that show the exam scores students in two different classes received on a particular exam:

We can type the following formulas into cells in column E to calculate the five-number summary of the exam scores for Class 1:

  • E2: =MIN(A2:A21)
  • E3: =QUARTILE(A2:A21, 1)
  • E4: =MEDIAN(A2:A21)
  • E5: =QUARTILE(A2:A21, 3)
  • E6: =MAX(A2:A21

We can then click and drag these formulas to the right to calculate the same values for Class 2:

  • E8: =AVERAGE(A2:A21)
  • E9: =STDEV(A2:A21, 1)

We can then click and drag these formulas to the right to calculate the same values for Class 2:

statistical comparison of two datasets in Excel

We can draw the following conclusions from this statistical comparison of the two datasets:

Conclusion 1: The two datasets have a similar “central” value.

Both datasets have a median exam score of 81. The mean values are only slightly different – the first class has an average exam score of 80.65 while the second class has an average exam score of 80.25.

This tells us that the “central” or “typical” exam score between the two classes is similar.

Conclusion 2: The first dataset has much greater “spread” in values.

There are several metrics that tell us the exam scores for the first class are much more spread out compared to the exam scores in the second class.

For example, the range of class 1 is much higher:

  • Range of Class 1: 96 – 65 = 31
  • Range of Class 2: 91 – 71 = 20

The interquartile range of class 1 is also much higher:

  • Interquartile Range of Class 1: 90.25 – 71 = 19.25
  • Interquartile Range of Class 2: 84.25 – 74.75 = 9.5

The standard deviation of class 1 is also much higher:

  • Standard Deviation of Class 1: 10.21
  • Standard Deviation of Class 2: 6.43

Each of these metrics tell us that the spread of exam scores for students in Class 1 is much higher than Class 2.

x