Table of Contents
In Excel, pivot tables offer an easy way to group and summarize data.
For example, if we have the following dataset in Excel then we can use a pivot table to quickly summarize the total sales by region:
This tells us:
- Region A had 51 total sales
- Region B had 85 total sales
- Region C had 140 total sales
Or we could summarize by another metric such as the average sales by region:
It turns out that we can quickly create similar pivot tables in R by using the group_by() and summarize() functions from the package.
This tutorial provides several examples of how to do so.
Example: Create Pivot Tables in R
First, let’s create the same dataset in R that we used in the previous examples from Excel:
#create data frame df <- data.frame(region=c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'), device=c('X', 'X', 'Y', 'X', 'Y', 'Y', 'X', 'X', 'Y', 'Y'), sales=c(12, 18, 21, 22, 34, 29, 38, 36, 34, 32)) #view data frame df region device sales 1 A X 12 2 A X 18 3 A Y 21 4 B X 22 5 B Y 34 6 B Y 29 7 C X 38 8 C X 36 9 C Y 34 10 C Y 32
Next, let’s load the dplyr package and use the group_by() and summarize() functions to group by region and find the sum of sales by region:
library(dplyr) #find sum of sales by region df %>% group_by(region) %>% summarize(sum_sales = sum(sales)) # A tibble: 3 x 2 region sum_sales 1 A 51 2 B 85 3 C 140
We can see that these numbers match the numbers shown in the introductory Excel example.
We can also calculate the average sales by region:
#find average sales by region df %>% group_by(region) %>% summarize(mean_sales = mean(sales)) # A tibble: 3 x 2 region mean_sales 1 A 17 2 B 28.3 3 C 35
Note that we can also group by multiple variables. For example, we could find the sum of sales grouped by region and device type:
#find sum of sales by region and device type df %>% group_by(region, device) %>% summarize(sum_sales = sum(sales)) # A tibble: 6 x 3 # Groups: region [3] region device sum_sales 1 A X 30 2 A Y 21 3 B X 22 4 B Y 63 5 C X 74 6 C Y 66