How can the stat_summary() function be used in ggplot2 to summarize data in a graphical representation?

The stat_summary() function in ggplot2 is a useful tool for summarizing data in a graphical representation. It allows users to quickly and easily visualize key summary statistics, such as mean, median, or standard deviation, for a given dataset. This function can be used to create a variety of graphs, including box plots, bar charts, and line graphs, making it a versatile tool for data exploration and analysis. By incorporating summary statistics into a visual format, the stat_summary() function helps to provide a clearer understanding of the underlying data and can be a valuable addition to any ggplot2 graph.

Use stat_summary() Function in ggplot2


You can use the stat_summary() function in ggplot2 to create visualizations that display summary metrics of specific variables in a data frame.

The following examples show how to use the stat_summary() function in practice with the following data frame in R:

#create data frame
df = data.frame(team=rep(c('A', 'B', 'C'), each=4),
                points=c(8, 12, 4, 6, 26, 21, 25, 20, 9, 18, 14, 14))

#view data frame
df

   team points
1     A      8
2     A     12
3     A      4
4     A      6
5     B     26
6     B     21
7     B     25
8     B     20
9     C      9
10    C     18
11    C     14
12    C     14

Example 1: Use stat_summary() to Visualize Mean Values with Bar Plot

The following code shows how to use the stat_summary() function to visualize the mean value in the points column of the data frame, grouped by the team column:

library(ggplot2)
library(dplyr)

#create bar plot to visualize mean points by team
df %>% 
  ggplot(aes(x=team, y=points)) +
  stat_summary(fun='mean', geom='bar') 

The bars in the bar plot represent the mean points value for each unique team value.

Notice that we used the fun argument within stat_summary() to specify the summary function to use and we used the geom argument to specify the geometric shape to use in the plot.

Example 2: Use stat_summary() to Visualize Mean Values with Scatter Plot

The following code shows how to use the stat_summary() function to visualize the mean value in the points column of the data frame, grouped by the team column, using points as the geometric shape:

library(ggplot2)
library(dplyr)

#create plot with points to visualize mean points by team
df %>% 
  ggplot(aes(x=team, y=points)) +
  stat_summary(fun='mean', geom='points') 

Notice that we used the geom argument within the stat_summary() function to specify that we’d like to use points as the geometric shape in the plot.

Example 3: Use stat_summary() to Visualize Minimum Values with Bar Plot

The following code shows how to use the stat_summary() function to visualize the minimum value in the points column of the data frame, grouped by the team column:

library(ggplot2)
library(dplyr)

#create bar plot to visualize minimum points by team
df %>% 
  ggplot(aes(x=team, y=points)) +
  stat_summary(fun='min', geom='bar') 

Notice that we used the fun argument within the stat_summary() function to specify that we’d like to use the minimum as the summary function.

Additional Resources

The following tutorials explain how to perform other common tasks in ggplot2:

x