Table of Contents

A box plot is a visual representation of the distribution of a set of data using five key points: minimum value, first quartile, median, third quartile, and maximum value. Variability in box plots refers to the spread or dispersion of the data points from the median. This variability is represented by the length of the box, the length of the whiskers, and the presence of outliers. A longer box or whiskers indicates a larger spread of the data, while the presence of outliers suggests extreme values that deviate from the overall pattern of the data. Therefore, variability in box plots provides important information about the range and distribution of the data, allowing for comparisons between different data sets.

Interpret Variability in Box Plots

A box plot is a type of plot that displays the five number summary of a dataset, which includes:

The minimum value
The first quartile (the 25th percentile)
The median value
The third quartile (the 75th percentile)
The maximum value

Here is how a typical box plot looks:

The most common way to measure variation in a box plot is by analyzing the interquartile range.

The interquartile range represents the spread of the middle 50% of the data.

In a box plot, it is represented by the width of the box, which ranges from the first quartile (Q1) to the third quartile (Q3)

Often we create multiple box plots on one plot to compare the distribution of several datasets at once.

The following example shows how to compare the variability between several box plots in practice.

Note: We prefer to use the interquartile range to measure variability in box plots instead of the range (max value – min value) because the interquartile range is .

Example: How to Analyze Variability in Box Plots

Suppose we collect data on the points scored by basketball players on three different teams.

Suppose we create the following three side-by-side box plots to visualize the distribution of points scored by players on each of the teams:

From the box plots we can see that Team B has the greatest variation in points scored because they have the greatest distance between the two ends of their box.

The interquartile range for Team B is roughly 21 – 12 = 9.

The interquartile range for Team C is roughly 27 – 23 = 4.

This example demonstrates the benefit of using box plots to analyze variability in datasets.

By simply looking at several box plots side by side, we are able to visually compare the variability in the underlying data.

Note: Here is the exact code that we used to generate these side-by-side box plots in R:

#create data frame
df <- data.frame(team=rep(c('A', 'B', 'C'), each=8),
                 points=c(5, 5, 6, 6, 8, 9, 13, 15,
                          11, 11, 12, 14, 15, 19, 22, 24,
                          19, 23, 23, 23, 24, 26, 29, 33))

#create vertical side-by-side boxplots
boxplot(df$points ~ df$team,
        col='steelblue',
        main='Points by Team',
        xlab='Team',
        ylab='Points')

Additional Resources

The following tutorials provide additional information about box plots:

What does variability in box plots represent?

Interpret Variability in Box Plots

Example: How to Analyze Variability in Box Plots

Additional Resources

Requst a

Scale

Example: How to Analyze Variability in Box Plots

Additional Resources

Related terms:

Requst a

Scale