Table of Contents

Box plots are a visual representation of the variability of a dataset. The box in the plot shows the middle 50% of the data, with the median represented by a line. The upper and lower whiskers show the range of the data, with outliers represented by individual data points. By examining the length of the box and the position of the median, one can interpret the spread or variability of the data. A shorter box and a median closer to one end of the box indicate less variability, while a longer box and a median closer to the middle indicate more variability. Outliers can also provide insight into extreme values that may be influencing the overall variability of the data. Overall, box plots provide a clear and concise way to interpret variability in a dataset.

A box plot is a type of plot that displays the five number summary of a dataset, which includes:

The minimum value
The first quartile (the 25th percentile)
The median value
The third quartile (the 75th percentile)
The maximum value

Here is how a typical box plot looks:

var1

The most common way to measure variation in a box plot is by analyzing the interquartile range.

The interquartile range represents the spread of the middle 50% of the data.

In a box plot, it is represented by the width of the box, which ranges from the first quartile (Q1) to the third quartile (Q3)

variability in box plots

Often we create multiple box plots on one plot to compare the distribution of several datasets at once.

The following example shows how to compare the variability between several box plots in practice.

Note: We prefer to use the interquartile range to measure variability in box plots instead of the range (max value – min value) because the interquartile range is .

Example: How to Analyze Variability in Box Plots

Suppose we collect data on the points scored by basketball players on three different teams.

Suppose we create the following three side-by-side box plots to visualize the distribution of points scored by players on each of the teams:

sidebox1-1

From the box plots we can see that Team B has the greatest variation in points scored because they have the greatest distance between the two ends of their box.

The interquartile range for Team B is roughly 21 – 12 = 9.

The interquartile range for Team C is roughly 27 – 23 = 4.

This example demonstrates the benefit of using box plots to analyze variability in datasets.

By simply looking at several box plots side by side, we are able to visually compare the variability in the underlying data.

Note: Here is the exact code that we used to generate these side-by-side box plots in R:

#create data frame
df <- data.frame(team=rep(c('A', 'B', 'C'), each=8),
                 points=c(5, 5, 6, 6, 8, 9, 13, 15,
                          11, 11, 12, 14, 15, 19, 22, 24,
                          19, 23, 23, 23, 24, 26, 29, 33))

#create vertical side-by-side boxplots
boxplot(df$points ~ df$team,
        col='steelblue',
        main='Points by Team',
        xlab='Team',
        ylab='Points')

Additional Resources

The following tutorials provide additional information about box plots:

How Can Variability Be Interpreted in Box Plots?

Example: How to Analyze Variability in Box Plots

Additional Resources

Requst a

Scale

Example: How to Analyze Variability in Box Plots

Additional Resources

Related terms:

Requst a

Scale