When Do You Need to Use stat=”identity” in ggplot2 Plots?

The stat = “identity” argument in ggplot2 is used when you want to ensure that the values that are plotted on the x and y axes are identical, as opposed to being transformed by a statistical transformation such as a smooth or a linear regression. This argument is particularly useful when your data is already in a format that you want to display on the plot, and you don’t want to change it by applying a transformation.


There are two common ways to use the geom_bar() function in ggplot2 to create bar charts:

Method 1: Use geom_bar()

ggplot(df, aes(x)) +
  geom_bar()

By default, geom_bar() will simply count the occurrences of each unique value for the x variable and use bars to display the counts.

Method 2: Use geom_bar(stat=”identity”)

ggplot(df, aes(x, y)) +
  geom_bar(stat="identity")

If you provide the argument stat=”identity” to geom_bar() then you’re telling R to calculate the sum of the y variable, grouped by the x variable and use bars to display the sums.

The following examples illustrate the difference between these two methods using the following data frame in R that shows the points scored by basketball players on various teams:

#create data frame
df <- data.frame(team=rep(c('A', 'B', 'C'), each=4),
                 points=c(3, 5, 5, 6, 5, 7, 7, 8, 9, 9, 9, 8))

#view data frame
df

   team points
1     A      3
2     A      5
3     A      5
4     A      6
5     B      5
6     B      7
7     B      7
8     B      8
9     C      9
10    C      9
11    C      9
12    C      8

Example 1: Using geom_bar()

The following code shows how to use the geom_bar() function to create a bar chart that displays the count of each unique value in the team column:

library(ggplot2)

#create bar chart to visualize occurrence of each unique value in team column
ggplot(df, aes(team)) +
  geom_bar()

The x-axis displays the unique values in the team column and the y-axis displays the number of times each unique value occurred.

Since each unique value occurred 4 times, the height of each bar is 4 in the plot.

Example 2: Using geom_bar(stat=”identity”)

The following code shows how to use the geom_bar() function with the stat=”identity” argument to create a bar chart that displays the sum of values in the points column, grouped by team:

library(ggplot2)

#create bar chart to visualize sum of points, grouped by team
ggplot(df, aes(team, points)) +
  geom_bar(stat="identity")

geom_bar with stat="identity" in ggplot2

The x-axis displays the unique values in the team column and the y-axis displays the sum of the values in the points column for each team.

For example:

  • The sum of points for team A is 19.
  • The sum of points for team B is 27.
  • The sum of points for team C is 35.

By using stat=”identity” in the geom_bar() function, we’re able to display the sum of values for a particular variable in our data frame instead of counts.

Note: For stat=”identity” to work properly, you must provide both an x variable and a y variable in the aes() argument.

The following tutorials explain how to perform other common tasks in ggplot2:

x