What are measures of dispersion?

Measures of dispersion are statistical calculations used to measure how much variation exists in a set of data. Examples of measures of dispersion include range, variance, standard deviation, and interquartile range. These measures are used to gain insight into the spread of the data and can be used to compare two or more sets of data.


When we analyze a dataset, we often care about two things:

1. Where the “center” value is located. We often measure the “center” using the mean and median.

2. How “spread out” the values are. We measure “spread” using range, interquartile range, variance, and standard deviation

Range

The range is the difference between the largest and smallest value in a dataset.

Suppose we have this dataset of final math exam scores for 20 students:

How to find standard deviation and variance of a dataset


The largest value is 98. The smallest value is 58. Thus, the range is 98 – 58 = 40.

Interquartile Range

The interquartile range is the difference between the first quartile and the third quartile in a dataset.

Quartiles are values that split up a dataset into four equal parts. Here is how to find the interquartile range of the following dataset of exam scores:

How to find standard deviation and variance of a dataset

1. Arrange the values from smallest to largest.

58, 66, 71, 73, 74, 77, 78, 82, 84, 85, 88, 88, 88, 90, 90, 92, 92, 94, 96, 98

2. Find the median. (In this case, it’s the average of the middle two values)

58, 66, 71, 73, 74, 77, 78, 82, 84, 85 (MEDIAN) 88, 88, 88, 90, 90, 92, 92, 94, 96, 98

3. The median splits the dataset into two halves. The median of the lower half is the lower quartile (Q1) and the median of the upper half is the upper quartile (Q3)

58, 66, 71, 73, 74, 77, 78, 82, 84, 85, 88, 88, 88, 90, 90, 92, 92, 94, 96, 98

In this case, Q1 is the average of the middle two values in the lower half of the data set (75.5) and Q3 is the average of the middle two values in the upper half of the data set(91).

Thus, the interquartile range is 91 – 75.5 = 15.5

Interquartile Range vs. Range

The interquartile range more resistant to outliers compared to the range, which can make it a better metric to use to measure “spread.”

For example, suppose we have the following dataset with incomes for ten people:

Comparing the range to the interquartile range
The range is $2,468,000, but the interquartile range is $34,000, which is a much better indication of how spread out the incomes actually are.

In this case, the outlier income of person J causes the range to be extremely large and makes it a poor indicator of “spread” for these incomes.

Variance

The variance is a common way to measure how spread out data values are.

The formula to find the variance of a population (denoted as σ2) is:

σ2 = Σ (xi – μ)2 / N

where μ is the population mean, xi is the ith element from the population, N is the population size, and Σ is just a fancy symbol that means “sum.”

Usually we work with samples, not populations. And the formula to find the variance of a sample (denoted as s2) is:

s2 = Σ (xix)2 / (n-1)

Standard Deviation

The standard deviation is the square root of the variance. It’s the most common way to measure how “spread out” data values are.

The formula to find the standard deviation of a population (denoted as σ ) is:

Σ (xi – μ)2 / N

And the formula to find the standard deviation of a sample (denoted as s) is:

√Σ (xi – x)2 / (n-1)

x