How do you quickly create pivot tables in R?


In Excel, pivot tables offer an easy way to group and summarize data.

For example, if we have the following dataset in Excel then we can use a pivot table to quickly summarize the total sales by region:

This tells us:

  • Region A had 51 total sales
  • Region B had 85 total sales
  • Region C had 140 total sales

Or we could summarize by another metric such as the average sales by region:

It turns out that we can quickly create similar pivot tables in R by using the group_by() and summarize() functions from the package.

This tutorial provides several examples of how to do so.

Example: Create Pivot Tables in R

First, let’s create the same dataset in R that we used in the previous examples from Excel:

#create data frame
df <- data.frame(region=c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'),
                 device=c('X', 'X', 'Y', 'X', 'Y', 'Y', 'X', 'X', 'Y', 'Y'),
                 sales=c(12, 18, 21, 22, 34, 29, 38, 36, 34, 32))

#view data frame
df

   region device sales
1       A      X    12
2       A      X    18
3       A      Y    21
4       B      X    22
5       B      Y    34
6       B      Y    29
7       C      X    38
8       C      X    36
9       C      Y    34
10      C      Y    32

Next, let’s load the dplyr package and use the group_by() and summarize() functions to group by region and find the sum of sales by region:

library(dplyr)

#find sum of sales by region
df %>%
  group_by(region) %>% 
  summarize(sum_sales = sum(sales))

# A tibble: 3 x 2
  region sum_sales
        
1 A             51
2 B             85
3 C            140

We can see that these numbers match the numbers shown in the introductory Excel example.

We can also calculate the average sales by region:

#find average sales by region
df %>%
  group_by(region) %>% 
  summarize(mean_sales = mean(sales))

# A tibble: 3 x 2
  region  mean_sales
        
1 A             17  
2 B             28.3
3 C             35  

Note that we can also group by multiple variables. For example, we could find the sum of sales grouped by region and device type:

#find sum of sales by region and device type
df %>%
  group_by(region, device) %>% 
  summarize(sum_sales = sum(sales))

# A tibble: 6 x 3
# Groups:   region [3]
  region device sum_sales
          
1 A      X             30
2 A      Y             21
3 B      X             22
4 B      Y             63
5 C      X             74
6 C      Y             66  

x