Table of Contents

Box plots are a graphical representation of a data set that can be used to compare different groups of data. Box plots are constructed by plotting the minimum, lower quartile, median, upper quartile, and maximum of a data set. A box plot can be used to determine the shape of a data set, its central tendency, and its spread. It can also be used to compare multiple data sets and determine the differences between them. Box plots are a powerful tool for quickly understanding the characteristics of a data set.

A box plot is a type of plot that displays the five number summary of a dataset, which includes:

The minimum value
The first quartile (the 25th percentile)
The median value
The third quartile (the 75th percentile)
The maximum value

To make a box plot, we draw a box from the first to the third quartile. Then we draw a vertical line at the median. Lastly, we draw “whiskers” from the quartiles to the minimum and maximum value.

Box plots are useful because they allow us to gain a quick understanding of the distribution of values in a dataset. They’re also useful for comparing two different datasets.

When comparing two or more box plots, we can answer four different questions:

1. How do the median values compare? We can compare the vertical line in each box to determine which dataset has a higher median value.

2. How does the dispersion compare? We can compare the length of each box (which represents the distance between Q1 and Q3 – the interquartile range) to determine which dataset is more spread out.

3. How does the skewness compare? The closer the vertical line is to Q1, the more positively skewed the dataset. The closer the vertical line is to Q3, the more negatively skewed the dataset.

4. Are outliers present? In box plots, outliers are typically represented by tiny circles that extend beyond either whisker. An observation is defined to be an outlier if it meets one of the following criteria:

An observation is less than Q1 – 1.5*IQR
An observation is greater than Q3 + 1.5*IQR

The following example shows how to compare two different box plots and answer these four questions.

Example: Comparing Box Plots

The following datasets display the exam scores for students who used one of two studying techniques to prepare for the exam:

Method 1: 78, 78, 79, 80, 80, 82, 82, 83, 83, 86, 86, 86, 86, 87, 87, 87, 88, 88, 88, 91

Method 2: 66, 66, 66, 67, 68, 70, 72, 75, 75, 78, 82, 83, 86, 88, 89, 90, 93, 94, 95, 98

If we create box plots for each dataset, here’s what they would look like:

We can compare these two box plots and answer the following four questions:

1. How do the median values compare? The line in the middle of the box plot for Study Method 1 is higher than the line for Study Method 2, which indicates that the students who used Study Method 1 had a higher median exam score.

2. How does the dispersion compare? The box plot for Study Method 2 is much longer than Study Method 1, which indicates that the exam scores are much more spread out among students who used Study Method 2.

3. How does the skewness compare? The line in the middle of the box plot for Study Method 1 is close to Q3, which indicates that the distribution of exam scores for students who used Study Method 1 is negatively skewed. Conversely, the line in the middle of the box plot for Study Method 2 is near the center of the box, which means the distribution of scores has little skew at all.

4. Are outliers present? Neither box plot has tiny circles that extend beyond the top or bottom whiskers, which means neither dataset had any clear outliers.

How to Compare Box Plots (With Examples)

Example: Comparing Box Plots

Requst a

Scale

Example: Comparing Box Plots

Related terms:

Requst a

Scale